9. The Isolation Fallacy: Signal Generation Without Context

Methodology·February 9, 2026·backtesting, bt-series, portfolio, risk-management

9.1 The Signal Isolation Blindspot

Novice and retail trading system developers suffer from what might be termed the signal isolation blindspot: an overwhelming focus on a single dimension of the problem, generating a profitable entry and exit signal (the alpha component), to the near-total exclusion of everything else that constitutes a complete trading system. This focus is understandable: the signal is the most intellectually engaging component, and it is the component that backtesting frameworks are designed to evaluate. However, a trading signal is only one element of a viable trading system. A complete system requires equally serious attention to transaction costs, risk management, portfolio construction, and execution. And deficiency in any one of these disciplines can render even a genuinely profitable signal unprofitable or catastrophic in practice.

The treatment of transaction costs is a particularly revealing symptom of this blindspot. Most retail traders regard transaction costs as an afterthought, a minor friction to be estimated and plugged into the model when the algorithm is otherwise near completion. The reality would shock most of them. The true quantum of transaction costs, encompassing not just commissions but bid-ask spreads (which vary with volatility and time of day), market impact (which scales non-linearly with order size), roll costs in futures (including calendar spread crossing costs and roll-window impact), financing costs for leveraged positions, and the adverse selection costs inherent in limit order execution, can easily consume the entirety of a strategy’s apparent edge. A strategy that shows an annualised return of 8% before costs may show 2% or less after realistic cost modelling, and may be outright unprofitable once market impact at any meaningful scale is considered.

Treating costs as an afterthought reflects a fundamental misunderstanding of the problem. Transaction costs are a structural constraint that should shape strategy design from the outset. A strategy that is not designed with its cost structure in mind may have no reason to exist.

Backtesting engines and programming knowledge do nothing to address this blindspot. A programmer-trader can be highly proficient with Python, fluent in the API of their chosen backtesting framework, and capable of generating sophisticated equity curves, while remaining entirely ignorant of the disciplines that determine whether those curves are achievable in practice. The tools make the signal generation problem easy and the remaining problems invisible.

This is the core asymmetry, and I believe it is the single largest contributor to retail trading failure.

9.2 The Contaminated Signal

The isolation blindspot has a second, subtler form. So far the argument has been that the signal is only one component of a system and that the others get ignored. But the signal itself can be contaminated by a return that the researcher has not isolated from it, so that what looks like a validated edge is actually two distinct return sources blended together, only one of which the strategy is designed to capture. Futures trend-following on back-adjusted data is the cleanest example, and it is worth spelling out because it is so easy to walk into.

As discussed in Continuous Contract Construction , additive back-adjustment imparts a deterministic drift to the level series whenever a market sits persistently in contango or backwardation, and that drift is real roll yield rather than an artefact. The drift is also smooth, low in noise, and highly autocorrelated — which is to say it has exactly the statistical signature that a trend or breakout rule is built to detect. A momentum system run over such a series will lock onto the roll-induced drift and report it as trend. The backtest’s measured trendiness (autocorrelation, Hurst exponent) and its Sharpe ratio are both inflated by a carry component the researcher never intended to trade and, more importantly, never measured. The system has not validated a trend edge; it has validated a trend-and-carry blend.

The danger is not that the profit is fake — the roll yield is real money, and a position that harvests it is genuinely paid. The danger is misattribution. The carry component persists only while the term-structure regime persists, and a curve that flips from contango to backwardation reverses the sign of the drift the trend rule was leaning on. A strategy whose backtested edge was partly carry, credited entirely to trend, can therefore degrade or invert in live trading for reasons that never appear in the historical record, because the historical record bundled the two returns into a single equity curve. The discipline that prevents this is the same one that defines the rest of this chapter: do not evaluate a signal in isolation. Decompose the backtested return into a roll-yield component and a residual price-trend component, and confirm that the part of the edge being credited to the trend rule is actually trend. A system tested across many contracts is more exposed to this, not less: the more instruments carry a persistent curve sign, the more of the aggregate Sharpe can be roll yield wearing the costume of trend.

9.3 The Risk Management Blindspot

Risk management deserves particular attention as a backtesting blindspot because of how widely it is misunderstood by the majority of retail traders. Most traders, if they address risk management at all, treat it as a backtesting step that begins and ends with the selection of a basket of markets to trade. Having chosen a diversified-looking set of instruments (perhaps a mix of equity indices, bonds, commodities, and currencies) the trader considers the risk management problem solved and returns to the more engaging task of signal optimisation. And, in fairness, signal optimisation is more interesting than calibrating margin requirements or modelling roll costs. That is part of the problem.

This view of risk management is grossly inadequate. Selecting a basket of markets is, at best, the first layer of a multi-layered discipline, and even that first layer is often executed poorly. Naive instrument-level diversification (holding positions across many markets) provides far less risk reduction than most traders assume because correlations between markets are not stable. Markets that appear uncorrelated during benign conditions frequently become highly correlated during periods of stress, just when diversification is most needed. The trader who has “diversified” across equity indices and industrial commodities may discover during a risk-off event that all of their positions are, in effect, a single correlated bet on global growth.

Beyond this first layer, genuine risk management encompasses a multi-layered discipline:

Position sizing: Calibrated to realistic, range-based volatility estimates rather than close-to-close volatility that understates true intraday risk (as discussed in Volatility and Risk Estimation ).
Exposure limits: Hard constraints applied at the instrument, sector, and portfolio levels.
**Correlation monitoring:**Accounting for regime-dependent co-movement, recognising that markets that appear to be uncorrelated frequently become highly correlated during stress events.
Drawdown controls: Predefined rules for position reduction during adverse equity excursions.
Tail risk analysis: Statistical stress-testing of the strategy under adverse conditions that have not yet occurred but plausibly could.

Each of these disciplines requires a working understanding of probability distributions, the non-stationarity of financial return series, and the limitations of historical data as a guide to future risk. The trader who stops at market selection is exposed to all manner of risks they have not identified (concentration risk masquerading as diversification, leverage risk arising from volatility underestimation, liquidity risk during stress events, and correlation risk from regime changes, among others.

Portfolio construction, the discipline of combining multiple strategies or positions to achieve a desired risk-return profile, requires an understanding of diversification that goes beyond naive instrument-level diversification to consider strategy-level correlation, regime-dependent co-movement, and the interaction between position sizing and portfolio volatility. This discipline is almost entirely absent from standard backtesting workflows and educational materials. Most traders have never heard of it.

The standard Python backtesting workflow does little to develop any of these competencies. A typical tutorial progresses from data acquisition to indicator calculation to signal generation to equity curve plotting, with risk management reduced to a fixed fractional position size and portfolio construction addressed not at all. The trader who follows this path emerges with a tool that can generate backtested equity curves but without the framework to evaluate whether those curves represent a tradable system or a statistical artefact.

A separate but related failure stems from the availability of ready-built strategies and strategy pipelines within popular algorithmic trading platforms. These offerings are attractive to inexperienced traders because they appear to eliminate the need for deep domain knowledge: the strategy is already backtested and already showing an impressive equity curve. The novice is therefore tempted to overlook the fundamental question of why the strategy might work, what market inefficiency or structural feature it exploits, under what conditions that feature is likely to persist, and what would cause it to disappear. Without answers to these questions, the trader has no basis for distinguishing a genuine edge from a statistical artefact, and no framework for deciding when to continue trading through a drawdown versus when to acknowledge that the strategy has ceased to function. The more experienced developer, by contrast, is far more likely to insist on understanding the causal mechanism behind a strategy before committing capital, recognising that a backtested equity curve without a plausible explanation is not evidence of an edge but merely evidence that a pattern existed in historical data.

9.4 The Drawdown Misconception

One of the most consequential forms of self-deception among novice system developers is the overestimation of one’s ability to tolerate drawdowns. A backtested equity curve that shows a 30% peak-to-trough drawdown followed by a recovery to new highs appears manageable in retrospect: the viewer knows how the story ends. Living through that same drawdown in real time, with real capital, with no certainty of recovery, is an entirely different psychological experience.

Research in behavioural finance consistently demonstrates that losses are experienced approximately twice as intensely as equivalent gains (Kahneman and Tversky, 1979). A drawdown that appears tolerable on a historical chart is, in practice, far more difficult to endure than the chart suggests. The temptation to abandon a strategy mid-drawdown (the worst possible time to do so, if the strategy retains its edge) is overwhelming for most traders, and the probability of strategy abandonment increases non-linearly with drawdown depth and duration.¹

Moreover, backtested drawdowns understate the drawdowns that will be experienced in live trading. Every bias I have discussed pushes in the same direction. Optimistic fill assumptions, understated transaction costs, concealed intraday stops, overfitted parameters: all contribute to equity curves that are smoother and shallower than reality. The trader who sizes their account to tolerate the backtested maximum drawdown is, in effect, sizing for a best-case scenario. When live drawdowns inevitably exceed backtested drawdowns, the psychological and financial pressure to abandon the strategy can become irresistible.

The situation is worse than mere understatement. Bailey, Borwein, Lopez de Prado, and Zhu (2014) proved that when the return series exhibits serial dependence (as it does for most trend-following and mean-reversion strategies), there is a provably negative linear relationship between in-sample and out-of-sample Sharpe ratios: the more aggressively a researcher optimises in-sample, the worse the expected out-of-sample performance becomes. Not worse than the backtest. Worse than not optimising at all. Their result implies that the standard disclaimer “past performance is not an indicator of future results” is, in this context, too optimistic. When backtest overfitting is not controlled for, good backtested performance is an indicator of negative future results. For retail traders developing mean-reversion or trend-following strategies on serially correlated return series, which is to say nearly all of them, this is a direct prediction about their money.

Learning to code a backtest in Python and plugging in a few libraries is a necessary starting point, but it leaves a prospective systematic trader far short of what is required. I estimate that technical signal generation is perhaps 20% of the problem; risk management, portfolio construction, execution infrastructure, statistical literacy, and psychological preparedness constitute the remaining 80%. The backtesting ecosystem, by making the 20% easy and the 80% invisible, actively contributes to the failure rate among aspiring systematic traders.

If signal generation is indeed only 20% of the problem, it follows that the programming effort most retail traders pour into backtesting may itself be misallocated. Building a backtesting engine that faithfully models execution, transaction costs, portfolio-level risk, and the full lifecycle of order management is an enormously demanding engineering task. It requires not only a high degree of programming skill but also deep domain knowledge of market microstructure, broker execution mechanics, and the statistical properties of financial return series, the very knowledge that most aspiring algorithmic traders have not yet developed. The sheer scope of a proper backtesting engine puts it well beyond the reach of many programmers, let alone those who are simultaneously learning both programming and trading. The result is that the novice trader who sets out to build their own backtesting infrastructure is likely to produce a system riddled with the very deficiencies catalogued throughout this discussion (unrealistic fill assumptions, understated costs, inadequate risk modelling) while consuming months or years of effort that could have been directed elsewhere.

For the yet-to-be-profitable trader, then, the question is not “how do I build a better backtest?” but “is building a backtest the highest-value use of my time at this stage of my development?”

For many, it is not.

The disciplines that constitute the other 80% of the problem (understanding risk management at a level that goes beyond selecting a basket of instruments, developing the statistical literacy to evaluate whether a result is genuine or an artefact, learning enough about execution to understand what a backtest cannot tell you, and cultivating the psychological resilience to trade through inevitable drawdowns) are arguably more important and more scarce than the ability to code a backtesting loop. The trader who has a deep understanding of risk, a realistic model of costs, and a genuine theoretical basis for their strategy but who uses a simple, even crude, simulation tool is likely to outperform the trader who has built an elaborate backtesting engine but lacks these foundational competencies. The tooling matters, but the judgment that governs its use matters more.

A useful exercise: ask any trader who claims they can tolerate a 30% drawdown whether they have ever actually experienced one with real money. In my experience, very few have, and those who have are notably more conservative in their claims the second time around. ↩︎