1. Introduction

Introduction··backtesting, bt-series, python-ecosystem

The barriers to entry for quantitative trading research have fallen dramatically over the past decade. Many trading platform products promise built-in backtesting capabilities, often touting their robustness credentials. And for those who are not using a trading platform, Python and open-source libraries such as Backtrader, Zipline, VectorBT, and various pandas-based frameworks have made it possible for individuals with modest programming skills to construct, test, and evaluate systematic trading strategies. This accessibility has been widely celebrated as a democratising force in financial markets.

But accessibility and rigour are not synonymous. The ease with which a backtested equity curve can be produced has led to a proliferation of strategy research that ranges from mildly optimistic to grossly unrealistic. Individual backtest errors are not the central issue; the ecosystem of tools, data sources, educational materials, and community conventions systematically biases users toward overestimating strategy performance. The typical retail back-tester does not know what they are not modelling. The tools do nothing to tell them.

This paper catalogues the principal failure modes of backtesting as commonly practised, with particular emphasis on areas where the gap between simulation and reality is largest and least understood. I give special attention to the under-examined problem of bar resolution (the information that is lost when price paths are compressed into OHLC bars, particularly at the daily level), which affects even strategies operating on longer timeframes, and to the pervasive data quality issues that contaminate results at the source level I also examine the limitations of the robustness testing methods that traders commonly rely upon to validate their results: Monte Carlo simulation, synthetic data generation, and walk-forward analysis. And I examine broader ecosystem factors, including the self-reinforcing dominance of Python, the monoculture of approaches encouraged by standard frameworks, and the tendency to treat signal generation in isolation from risk management and portfolio construction.

I should note at the outset that the criticisms presented here are not directed at Python as a programming language per se. Python is a capable general-purpose language with legitimate applications across many domains. My concern is with the ecosystem that has formed around it in the trading context: the libraries, the conventions, the educational materials, and the implicit assumptions absorbed by developers who adopt the dominant toolchain without sufficient critical examination.