How to test backtests – a primer on what makes them reliable
It’s hard to know all of the reasons for Buch’s comment above, but I suspect a large part of it has to do with the pitfalls of backtesting.
Backtesting is necessary for any systematic/algo-based/quantitative strategy, but it can be easy and is too often abused. Additionally, if done poorly because it’s about concrete numbers, it can give a false sense of reliability and accuracy.
spurious correlations
“One of the first things taught in introductory statistics textbooks is that correlation is not causation. It’s also one of the first things to forget.” – Thomas Sowell
A spurious correlation is a relationship between two variables that appears causal but is not. It is often just the result of a simple coincidence or there is an undiscovered third factor at play. Spurious correlations are the kryptonite of algo strategies.
Check out the chart below – no matter how low your opinion of Nicholas Cage, you’d agree there’s no way more people are killing themselves by drowning in a pool just because he’s in too many films this year occured.
Spurious correlations might be funny (you can check out https://www.tylervigen.com/spurious-correlations), but backtests are no laughing matter. They are crucial in figuring out how the strategy performs in different conditions.
So next time you’re amazed by backtesting results, here are a few things to keep in mind to make sure they’re replicable in real life.
Garbage in, garbage out
First and foremost, any model is only as good as the data you feed it. If your data isn’t accurate, your model certainly won’t be—extreme care must be taken to ensure the data considered is clean and accurate.
idealized conditions
This is best understood with an example. When backtesting a strategy, one can simply assume that one can get out at a closing price, or if it is a day trading strategy then at what the price was during the day. In reality, however, there are a lot of fees that go beyond the price. Not just actual fees like brokerage fees, STT, etc., but also hidden variances like impact costs.
Whoever participates in the market influences it — especially when trading illiquid small-cap stocks. For this purpose, the direct loads should definitely be included in the back tests, but the impact costs should also be modeled.
Even if you do all of this, keep in mind that this will always be an estimate and in practice your realized prices can vary greatly. Therefore, any expected strategy should have a built-in safety margin.
survivorship bias
One obvious mistake many people make is when they consider investing in companies that only those trading today are considered during backtesting. But then many companies that have gone bust since the backtest period are ignored.
For example, Kingfisher Airlines is no longer listed today because it went bankrupt. However, if you backtest your algo from 2007 to 2018 (before it was delisted), Kingfisher Airlines should be included in the stock universe for the algo to choose from.
By excluding bankrupt companies that no longer exist, the algo could be given an unfair advantage in backtests. To deal with this, one must try to include all companies that traded during the backtest period.
overfitting
Overfitting can result from your model performing well over a period of time based on certain specific conditions that may not be universally true. Let’s say you’re running a momentum based strategy and you’re just testing it over a period that was very optimistic.
Well, your strategy will of course work well, but there will also be downtrends over the long term. Because your period has not been dialed in correctly, you will mistakenly think that the system is better than it actually is; Also, one can add parameters so that the model performs well over any period of time, but then this can lead to a breakdown in actual performance.
Therefore, you should try to use as few parameters as possible and backtest over as long a period of time as possible.
In addition, one should train and test over several different time periods and check the robustness of the system in different scenarios.
data snooping
If you build the model and then test its performance in the past, you need to make sure that you don’t use the information from the test period itself.
Suppose today (October 2022) you want to test the performance of your model from January 1, 2020 to December 31, 2020.
Then you cannot use any information after January 1, 2020 when creating the portfolio from January 1, 2020, although technically it is available to you since you are sitting in October 2022.
Sometimes it may not even be intentional, but while in a complicated model one can overlook this and “look” at the passing of information.
This will obviously result in a model with very high backtest scores that just don’t work in the real world unless you are a fortune teller or god!
This is by no means an exhaustive list (underfitting, out-of-sample testing, etc. are also powerful tools), but it is a good starting point for evaluating backtesting. It will surely help you eliminate a large percentage of bad apples.
In closing, I want to leave you with a warning that even if a backtest passes all of these tests and more, the fact is markets are evolving. Even if your model has been robust in the past, it may not be in the future. As with any ability, you’ll need to keep upgrading it to ensure it stands the test of time.
“Most people use statistics like a drunk uses a lamppost; more for support than for enlightenment”
(Disclaimer: Experts’ recommendations, suggestions, views and opinions are their own. These do not represent the views of Economic Times)