Backtesting with Historical Data: Sources, Quality, and Frequency

This is an amateur website and It’s not a professional publication. Pages are written on an occasional basis and are free to read. Contents herein do not predict economic scenarios or financial outcomes and to the best knowledge of the author they represent the current consensus in technical and academic research and are presented for educational purpose only and under any circumstance they are not financial advice or solicitation to trade. Pages contain paid links. The whole content of this website is not intended for residents of Chile, Andorra, Italy, Spain, France, Germany, Turkey, Greenland or any individual under legal age.

Backtesting with Historical Data: Sources, Quality, and Frequency

In quantitative finance, the line between a robust strategy and a statistical illusion is drawn with historical data. Backtesting, the process of simulating a trading strategy on past data to evaluate its viability, is the cornerstone of algorithmic development. However, the efficacy of any backtest is entirely contingent on three foundational pillars: the sources of the data, its quality, and the frequency at which it is sampled. Neglecting any of these can lead to overfitting, survivorship bias, and catastrophic out-of-sample failure. This article dissects these critical components, providing a technical roadmap for constructing defensible backtests.

I. Sources of Historical Data: The Genesis of Information

The universe of historical financial data is vast, ranging from free, crowd-sourced repositories to institutional-grade feeds costing tens of thousands of dollars annually. The choice of source must align with the strategy’s complexity, the asset class, and the required precision.

1. Free and Open-Source Repositories (High Risk, Low Barrier)

Sources like Yahoo Finance (via yfinance), Alpha Vantage (limited API calls), and Stooq are popular for retail traders. They offer daily (EOD) and some intraday data for equities, ETFs, and forex.

Use Case: Preliminary strategy research, educational backtests, and long-term trend-following where tick-level precision is unnecessary.
Critical Flaws: These sources often suffer from incomplete dividend adjustments, forward-split adjustments that lag, corporate action omissions, and data corruption for thinly traded assets. They are unsuitable for high-frequency or penny stock strategies due to frequent gaps and mispriced fills.

2. Broker-Sourced Data (Medium Fidelity, Actionable)

Platforms like Interactive Brokers (IBKR), TD Ameritrade (via td-ameritrade-api), and Alpaca provide historical data derived from their own order flow and aggregated exchange data.

Use Case: Live strategy simulation with realistic execution assumptions, testing slippage models, and strategies that depend on market microstructure (e.g., VWAP crossing).
Advantages: Direct integration with brokerage APIs allows for seamless walk-forward analysis. The data often includes bid-ask spreads and volume-weighted pricing, which is crucial for limit order strategies.
Limitations: Data is often limited to the broker’s reach, meaning illiquid OTC markets or specific foreign exchanges may be truncated or delayed by 15-20 minutes for non-subscribers.

3. Institutional and Aggregated Vendors (High Fidelity, High Cost)

Firms like Refinitiv (formerly Thomson Reuters), Bloomberg, QuantConnect (Lean Engine), and Polygon.io provide curated, multi-exchange, and corporate-action-adjusted datasets.

Use Case: Multi-asset strategies (equities, futures, options, FX), machine learning models requiring massive tick bars, and compliance-driven backtesting for funds.
Key Features:
- Tape Recorder Data: Complete, unadjusted order book snapshots at millisecond precision.
- Universe of Assets: Includes bonds, derivatives, and international equities that are notoriously difficult to source.
- Automatic Adjustments: Splits, dividends, stock mergers, and spin-offs are pre-calculated and applied, eliminating survivorship bias.
Drawbacks: Cost can exceed $1,000 USD per month for a single data feed. Complex licensing agreements restrict redistribution.

II. Data Quality: The Silent Killer of Robust Backtests

Poor data quality is the single largest source of false positives in algorithmic trading. A backtest that shows a 30% annual return with a Sharpe ratio of 3.0 is often a sign of data artifacts rather than true alpha.

1. Survivorship Bias (The Most Common Error)

This occurs when a backtest only includes assets that currently exist (e.g., the S&P 500 constituents today). Delisted companies (bankruptcies, acquisitions, delistings) are excluded, inflating past returns.

Impact: Historically, survivorship bias can overstate backtest returns by 2-5% annually for long-only equity strategies.
Mitigation: Use point-in-time (pit) datasets. These capture the exact universe of assets as it existed on a given historical date. Vendors like CRSP (Center for Research in Security Prices) or Compustat provide pit data for US equities.

2. Look-Ahead Bias

This occurs when a backtest uses information that was not available at the time of the trade. The most common example is using a future dividend or earnings date to trigger a trade.

Impact: Perfect foresight leads to unrealistic entry and exit points.
Mitigation: Ensure all data used for feature engineering (e.g., moving averages, volatility calculations) is strictly lagged by one full period. For example, use closing prices from t-1 to generate a signal for t.

3. Corporate Action Errors (Split and Dividend Mishandling)

A stock split (e.g., 2:1) halves the price but doubles the shares. If a backtesting engine treats this as a 50% price drop, it will generate a false buy signal.

Types:
- Price Adjustment: Historical prices are recalculated to reflect all past splits and dividends.
- Total Return Data: Includes dividends reinvested, crucial for long-term strategies.
Mitigation: Never use raw, unadjusted prices. Always source data flagged as “adjusted close” or “total return.”

4. Stale Data and Gaps

Illiquid assets (small caps, microcaps, certain ETFs) may trade only a few times per day. Using the last traded price from 2 hours ago as the current price underestimates true transaction costs.

Impact: Fills in a backtest appear perfect, but in live trading, the spread is wide and execution is delayed.
Mitigation: Implement a data staleness check. Reject any bar where the last trade timestamp is older than 5 minutes for 1-minute bars. For daily bars, a common practice is to skip assets with fewer than 10 trading days per month.

III. Data Frequency: Choosing the Right Temporal Lens

Data frequency dictates the granularity of your backtest and, critically, the transaction costs you must account for. The wrong frequency can mask slippage or artificially smooth volatility.

1. Daily (1D) Bars: The Baseline for Strategic Models

Typical Strategies: Trend following, mean reversion over weeks, factor investing (value, momentum).
Analysis: Daily data is sufficient for strategies with a holding period of 3 days or more. It is cheap, widely available, and computationally light.
Pitfall: Daily data obscures intraday volatility. A strategy that appears to have a 0.5% stop-loss on daily bars may have actually been triggered 15 times intraday, generating massive commission costs.

2. Intraday Bars (1-minute, 5-minute, Hourly): The Arbitrage Zone

Typical Strategies: Pairs trading, statistical arbitrage, intraday momentum, scalping.
Analysis: Required to model slippage accurately. With intraday data, you must model slippage as a function of volume and volatility. A common heuristic: slippage scales inversely with volume.
Statistical Considerations: Higher frequency data exhibits stronger autocorrelation and non-normality (fat tails). Using a simple standard deviation for risk management at 1-minute intervals is inadequate. Use realized volatility (e.g., Parkinson or Yang-Zhang estimators) instead.

3. Tick and Second Bars: Institutional Turf

Typical Strategies: Market making, latency arbitrage, HFT.
Analysis: At this level, data becomes a streaming event stream. Backtesting requires a tick-by-tick simulation engine (e.g., Event-Driven Backtester). The primary challenge is the zero-intelligence noise floor. At tick frequency, most price changes are random noise (bid-ask bounce). Backtests must model order book dynamics—queue position, order cancellation, and fill probability.
Minimum Data Requirement: A single day of tick data for the S&P 500 can exceed 100 GB. Efficient storage (e.g., Parquet or database columnar stores) and compute resources are mandatory.

IV. The Frequency-Quality-Source Trilemma

Every backtesting project must balance three competing constraints: Cost, Speed, and Fidelity.

Scenario A (Retail Trend Follower): Low cost, moderate speed, low fidelity. Acceptable. Use free daily data (source: Yahoo), apply survivorship bias filters manually, and keep holding periods long (>10 days) to mitigate intraday noise.
Scenario B (Algorithmic Options Trader): Moderate cost, high speed, high fidelity. Use broker-sourced minute bars (source: IBKR). Quality demands are extreme because options decays are sensitive to the exact timestamp. Use total return data and account for dividend dates.
Scenario C (Quantitative Hedge Fund – Statistical Arbitrage): High cost, high speed, maximum fidelity. Use institutional tick data (source: Polygon/Refinitiv). Reject any data that is not point-in-time. The backtest must run on a cluster or cloud instance, processing terabytes of data to eliminate even the smallest look-ahead bias.

V. Practical Data Pipeline: From Raw Source to Clean Feature

A robust backtesting pipeline includes specific steps to ensure data integrity:

Ingestion: Download raw tickers from your chosen source. Always request unadjusted prices for splits and dividends separately.
Cleaning:
- Remove negative prices, zero-volume bars, and rows with missing timestamps.
- Interpolate missing data only for synthetic instruments (e.g., indices) but never for individual equities.
- Apply a forward-fill (ffill) for price but a backward-fill (bfill) for volume to avoid look-ahead.
Adjustment:
- Calculate a Cumulative Adjustment Factor (CAF). For each corporate action (split, dividend), compute the multiplier. Apply the CAF to historical prices chronologically.
- For total return, add the dividend amount back to the price on the ex-dividend date.
Resampling:
- Convert tick data to your chosen frequency (e.g., 1-minute) using an OHLCV (Open, High, Low, Close, Volume) aggregation.
- Ensure timestamps are aligned to a common timezone (preferably UTC or exchange local). Avoid splitting data across trading sessions (e.g., after-hours to pre-market mixing).
Validation:
- Run cross-checks: Compare your data against a known benchmark (e.g., the S&P 500 daily close) for a single day.
- Check for negative bid-ask spreads or prices that violate the zero-bound.
- Verify that the sum of volume across all intraday bars equals the reported daily volume for that asset.

VI. Frequency-Specific Modeling Techniques

Once you have clean data at a specific frequency, your backtesting simulation must adapt:

For 1-Minute Bars: Model fills using the VWAP (Volume Weighted Average Price) of that minute, not the close. Assume partial fills if volume is insufficient.
For Tick Data: Model fills using queue position. The probability of being filled at the bid/ask depends on the number of orders ahead of you in the book.
For Daily Data with Intraday Volatility: Use a range-based volatility estimator (e.g., Garman-Klass) to model the maximum possible slippage if a stop-loss is hit intraday. This prevents the backtest from assuming a stop-loss fills at the daily close when it may have been triggered at a much worse price.

VII. The Imperative of Data Hygiene in Machine Learning

For strategies driven by Machine Learning, data quality and frequency take on an even more critical role. Overfitting is exponentially worse with high-frequency, noisy data. A deep learning model trained on five years of 1-minute bars will learn the specific noise patterns of that period. When deployed on new data, the model fails.

Solution: Use time-series cross-validation (e.g., Purged Walk-Forward Analysis). Never allow data from the future to leak into the training set. Always purge overlapping data points in time.
Frequency Caps: For ML models, often a lower frequency (e.g., hourly or daily) with engineered features (e.g., volatility, rolling z-scores) outperforms raw high-frequency data. The lower signal-to-noise ratio of high-frequency data is detrimental to gradient-based optimizers.

Backtesting with Historical Data: Sources, Quality, and Frequency

Backtesting with Historical Data: Sources, Quality, and Frequency

I. Sources of Historical Data: The Genesis of Information

II. Data Quality: The Silent Killer of Robust Backtests

III. Data Frequency: Choosing the Right Temporal Lens

IV. The Frequency-Quality-Source Trilemma

V. Practical Data Pipeline: From Raw Source to Clean Feature

VI. Frequency-Specific Modeling Techniques

VII. The Imperative of Data Hygiene in Machine Learning

Momentum Stock Investing: Strategies for Maximum Returns

Options Trading with Mean Reversion Strategies

Backtesting Momentum Strategies: Finding What Really Works

Backtesting with Historical Data: Sources, Quality, and Frequency

Backtesting with Historical Data: Sources, Quality, and Frequency

I. Sources of Historical Data: The Genesis of Information

II. Data Quality: The Silent Killer of Robust Backtests

III. Data Frequency: Choosing the Right Temporal Lens

IV. The Frequency-Quality-Source Trilemma

V. Practical Data Pipeline: From Raw Source to Clean Feature

VI. Frequency-Specific Modeling Techniques

VII. The Imperative of Data Hygiene in Machine Learning

Momentum Stock Investing: Strategies for Maximum Returns

Options Trading with Mean Reversion Strategies

Backtesting Momentum Strategies: Finding What Really Works

Discover more from DNS Research