Top 5 Backtesting Mistakes That Ruin Your Trading Strategy

This is an amateur website and It’s not a professional publication. Pages are written on an occasional basis and are free to read. Contents herein do not predict economic scenarios or financial outcomes and to the best knowledge of the author they represent the current consensus in technical and academic research and are presented for educational purpose only and under any circumstance they are not financial advice or solicitation to trade. Pages contain paid links. The whole content of this website is not intended for residents of Chile, Andorra, Italy, Spain, France, Germany, Turkey, Greenland or any individual under legal age.

1. Survivorship Bias: Trading in a Fantasy World Where Losers Don’t Exist

Survivorship bias is the quiet killer of backtesting accuracy. It occurs when your historical dataset only includes assets that still exist today, ignoring those that were delisted, went bankrupt, or were acquired. This creates a distorted, overly optimistic view of a strategy’s performance.

Why It’s Catastrophic
Imagine backtesting a small-cap mean-reversion strategy. Your dataset includes the 500 current stocks, but it excludes the 200 that failed during the 2008 financial crisis. Your backtest shows a 15% annual return with low drawdowns. In reality, if you had included the failed stocks, your actual returns would have been 5% with a 60% drawdown. The strategy only worked because you were only trading survivors—not the real market.

Real-World Example
A 2019 study by researchers at AQR Capital Management found that ignoring delisted stocks can inflate a trading strategy’s Sharpe ratio by 0.5 to 0.8. For a strategy targeting a Sharpe of 1.0, that’s a 50-80% overestimation of risk-adjusted returns.

How to Fix It

Use survivorship-bias-free databases: Sources like CRSP (US), Compustat, or QuantConnect provide point-in-time data that includes dead stocks.
For crypto: Use data from CoinMarketCap historical snapshots or Kaiko, which include delisted tokens.
Test with synthetic failures: Backtest your strategy and then manually remove the top 5-10% performing assets over the last decade. If performance drops significantly, you are likely suffering from survivorship bias.
Delisting returns: Incorporate a -100% return assumption for assets that drop below a market cap threshold (e.g., <$50 million) for two consecutive months to simulate bankruptcy.

2. Look-Ahead Bias: Letting Future Knowledge Leak Into Past Decisions

Look-ahead bias, also known as forward-looking bias, occurs when your backtest uses information that was not available at the time of the trade. This is the single most common reason why backtests look perfect but fronttests fail.

The Three Forms of Leakage

Data Leakage: Using adjusted closing prices that include future splits, dividends, or corporate actions. For example, a stock splits 3:1 on day 50. Your backtest on day 20 uses the adjusted price, which is 1/3 of the actual price. Your signals will be distorted.
Signal Leakage: Using indicators that require future data. A classic error: calculating a 50-day moving average but using today’s close as part of the calculation for a signal that should be generated after the close.
Fundamental Leakage: Using quarterly earnings data released in March to make a trade in January. The data was not public yet. This inflates strategies that rely on sudden earnings surprises because you are effectively predicting them.

Devastating Impact
A 2020 paper by Robert Carver demonstrated that a simple 50/200-day moving average crossover strategy on the S&P 500 lost 60% of its simulated alpha when look-ahead bias was removed. The flawed version showed a 12% CAGR; the corrected version showed 4.7%.

How to Fix It

Use point-in-time datasets: Premium databases like Sharadar for US equities provide fundamental data with “as reported” and “as available” dates.
Shift your data: For daily strategies, always shift your data by one row. The price at the close of day ( t ) should generate the signal for the open of day ( t+1 ).
For crypto: Use timestamped order book data from providers like CryptoDataDownload. Never use “close” data from the same minute as your signal generation.
Code a sanity check: After your backtest runs, print the date of the first trade and the date of the datasource’s last corporate action. If the trade date precedes the action date, you have a leak.

3. Overfitting to Noise: Mistaking Random Patterns for Profitable Signals

Overfitting is the process of curve-fitting your strategy to historical data so precisely that it captures noise rather than signal. It is the leading cause of poor out-of-sample performance and is especially dangerous for retail traders using visual inspection.

The Illusion of Precision
You test 50 different combinations of moving average lengths (e.g., 5/20, 10/40, 15/60) and find that the 7/23 combination has a 90% win rate during a three-month bull run. This is almost certainly noise. If you had tested 50 random strategies, by chance alone, 2-3 would have had a 90% win rate in that period. You have memorized the historical path, not modeled the underlying market dynamics.

A Statistical Trap
The more parameters your strategy has (e.g., entry/exit thresholds, stop loss, trailing stop, volume filter), the easier it is to overfit. A rule of thumb from finance professor Marcos López de Prado: the ratio of the number of parameters to the number of trades should be less than 1:100. If you have 10 parameters, you need at least 1,000 trades to avoid overfitting.

How to Fix It

Cross-validation: Divide your data into three periods: training (50%), validation (25%), and test (25%). Optimize parameters only on the training set. If performance drops sharply on the validation set, you are overfitted.
Use walk-forward analysis: Instead of a single optimization, run rolling optimizations. For example, optimize on 2015-2017, test on 2018, then re-optimize on 2016-2018, test on 2019. If the optimized parameters change wildly each year, the strategy is not robust.
Simplicity test: Reduce your strategy to one or two parameters. If a simple 20-day moving average crossover performs nearly as well as your 7-parameter system, the added complexity is noise.
Monte Carlo randomization: Shuffle the returns of your strategy randomly 1,000 times. If the original backtest’s Sharpe ratio is an extreme outlier compared to the shuffled distribution, it is likely overfitted.

4. Ignoring Transaction Costs, Slippage, and Liquidity Constraints

A backtest that assumes frictionless trading is a fantasy. The real market charges commissions, bid-ask spreads, and imposes liquidity limitations that can turn a profitable strategy into a portfolio-burning machine.

The Hidden Costs

Slippage: The difference between the price you see in the backtest and the price you actually fill. For a liquid stock (e.g., Apple), slippage might be 0.01%. For a micro-cap or a crypto altcoin during volatility, slippage can be 1-5% per trade.
Bid-Ask Spread: A strategy that trades frequently (e.g., 100 times per year) on a stock with a $0.05 spread on a $20 stock (0.25% per trade) loses 25% of its capital annually just to spreads.
Market Impact: If your strategy trades 5% of the average daily volume (ADV), you will move the market against yourself. A large order for a small-cap stock can increase the price by 0.5-1.0% before your order is fully filled.

Quantitative Breakdown
Assume a strategy has a 60% win rate, a 2:1 reward-to-risk ratio, and trades 50 times per year. Gross annual return: 30%. With 0.5% total friction per trade (commission, spread, slippage), annual net return drops to 5%. The strategy appears robust on paper but is barely break-even in reality.

How to Fix It

Realistic slippage modeling: Use a fixed slippage assumption (e.g., 0.05% for S&P 500, 0.5% for mid-caps, 2% for micro-caps/crypto). Then test a stress scenario of 2x that figure.
Volume-adjusted fills: Code your backtest so that if your order size exceeds 10% of the minute’s volume, the fill is delayed or partially rejected.
Commission tiering: Use tiered brokerage fees. For example, Interactive Brokers charges $0.005 per share for US stocks but a minimum of $1 per order.
Crypto-specific: Account for gas fees on Ethereum (variable, often $5-50 per swap) and withdrawal fees (typically 0.0005 BTC). A high-frequency strategy on Uniswap can lose 80% of its theoretical profits to gas alone.

5. Strategy Scope Mismatch: Testing in a Bull Market and Expecting It to Work Everywhere

A strategy is only as good as the market conditions it was tested under. The most dangerous backtest is one that covers only a single regime—especially a strong bull market. Trading strategies are not universal; they are regime-dependent.

The Regime Dependency Problem

Trend-following strategies: These perform exceptionally well in trending markets (e.g., 2017-2021 for crypto, 2009-2020 for US equities). They fail catastrophically in sideways or choppy markets (e.g., 2015, 2022).
Mean-reversion strategies: These thrive in range-bound, low-volatility environments (e.g., 2017 for US equities) and suffer steep losses during sharp breakouts or crashes.
Volatility strategies: Short-volatility strategies made huge profits during the calm 2017-2018 period, but blew up in February 2018 and March 2020.

The Data Snooping Issue
Backtesting on the S&P 500 from 2009-2020 captures the longest bull market in history. A strategy that buys at the close and holds for one month would show a 15% annual return. That same strategy applied to the Nikkei 225 from 1990-2000 would show a -50% drawdown. The same holding period, different regime.

How to Fix It

Test across multiple decades: Your backtest should include at least one bull, one bear, and one sideways market. For US equities, that means data from at least 2000-2003 (bear), 2004-2007 (bull), 2008-2009 (crash), 2010-2013 (recovery), and 2014-2015 (range-bound).
Test on uncorrelated assets: If your strategy works on the S&P 500, test it on the Euro Stoxx 50, Nikkei 225, and Bitcoin (from 2015). If it fails across the board, it is not robust.
Regime labeling: Code your backtest to automatically identify volatility regimes using a 30-day standard deviation of returns. Test your strategy separately in high-vol (>2% daily) and low-vol (<0.5% daily) environments. If it only works in one regime, it is not a universal edge.
Out-of-sample forward testing: After your backtest, paper trade the strategy for 3-6 months in current market conditions. If the forward test contradicts the backtest, your strategy was likely optimized for a past regime that no longer exists.