3. Execution Simulation Fidelity
3.1 The Fill Assumption Problem
The most fundamental question in any backtest is deceptively simple: would this trade have been filled, and at what price? The vast majority of retail backtesting engines answer this question with naive assumptions that bear little resemblance to real market microstructure.
The most common assumption is that a trade is filled at the closing price of the bar on which the signal is generated, or at the opening price of the subsequent bar. In either case, the implicit model is one of infinite liquidity at a single price point with zero market impact. This is problematic for several reasons. First, the closing price of a daily bar is a single print that may reflect a momentary condition rather than an achievable execution level. Second, the opening price of the next bar (particularly in futures markets where overnight gaps are common) may be materially different from any price at which a resting order could reasonably have been filled. Third, for any strategy trading meaningful size, the act of execution itself moves the market, creating slippage that increases non-linearly with order size relative to available liquidity.
These execution problems are worse again in cryptocurrency markets. Crypto exchanges permit leverage of 50x to 125x, and forced liquidations are visible via public APIs, creating a reflexive dynamic in which liquidations trigger further liquidations. During these cascades, order book depth can collapse entirely, producing price dislocations that no smooth slippage model will capture. A crypto backtest using the same volatility-scaled slippage assumptions that work tolerably well for regulated futures is likely to be dangerously optimistic about fills during the episodes that matter most.
For limit orders, the situation is more problematic still. A backtest that assumes a limit order is filled whenever price touches the limit level is ignoring queue position entirely. In reality, a limit order resting at a popular price level may never be filled even as the market trades through that level, because orders ahead in the queue absorb the available liquidity. This distinction between “price touched” and “order filled” is one of the largest sources of phantom profitability in backtested strategies, particularly for mean-reversion systems that rely on passive entry.
A related and often overlooked phenomenon is adverse selection. Limit orders that do get filled are disproportionately filled in situations where the market is moving aggressively against the order, precisely because it is the aggressive flow that consumes resting liquidity. A limit buy order, for example, is most likely to be filled when selling pressure is intense enough to sweep through the order book to that price level. The fill itself is therefore actually a negative signal about future price direction. Getting filled is the bad news. Backtests that model limit order fills as occurring at the limit price, without accounting for the conditional probability that a fill implies adverse momentum, systematically overstate the profitability of passive entry strategies. Empirical studies of limit order book dynamics (Cont, Stoikov, and Talreja, 2010) confirm that the expected post-fill price movement conditional on execution is, on average, unfavourable to the limit order placer.
Stop orders are mishandled in a closely related way. When a backtest detects that a stop level was breached inside a bar, it almost always books the fill at the stop price itself. In live trading the stop becomes a market order at the moment of the breach, and it fills at the next available traded price, which in any fast or thin market is some distance through the stop level rather than at it. Stops gap through; backtests do not. Combined with the OHLC sequence ambiguity discussed in Section 4.2 , the result is that the simulator is simultaneously over-counting target hits and under-pricing stop hits, and the two errors compound in the same direction.
These naive fill assumptions (fills at the close, fills at the open, fills at the limit price without queue consideration, stops at the stop price) should be avoided at all costs. And they can be, without heroic effort. It is useful to distinguish two tiers of execution realism.
The first tier is achievable with modest engineering effort and represents an enormous improvement over platform defaults: requiring price to trade through a limit level before assuming a fill, applying empirical spread data rather than fixed estimates, incorporating volatility-dependent slippage, modelling delayed fills, and treating partial fills pessimistically. These adjustments are well understood and address the most egregious sources of phantom profitability in retail backtests.
The second tier (queue position modelling from market-by-order or Level 2 depth data, calibrated market impact curves (Almgren & Chriss, 1999), latency simulation, adverse selection modelling conditional on fill, and auction-print execution dynamics) is a genuine research problem. Each component requires its own data and calibration, and each introduces assumptions that are themselves subject to uncertainty. Production desks invest heavily in this infrastructure; for the retail trader, the second tier is aspirational but not essential. What is essential is the first tier, and the fact that most retail backtesting engines continue to ship without even these basic corrections is a failure of the tooling, not a reflection of any inherent difficulty in the problem.
3.2 Your Platform May Be Working Against You
In some cases, the unrealistic fill model is not a default that the researcher has failed to override; rather, it is a constraint imposed by the platform itself. In some platforms it is possible to turn on settings to improve the fill model, but it’s not always obvious how to do this or that is sufficiently important to make sure that it is considered in the backtest.
Many platforms do not provide a mechanism for placing exit orders in response to an entry fill event. A protective stop or profit target can only be placed after the strategy detects the filled position on the next bar, introducing a mandatory one-bar delay between entry and the activation of any risk management order. On daily bars, this means an entire trading day of unprotected exposure after every entry.1 In live trading, the trader may have had a stop order linked to the initial entry order, and this is what should be modelled. Rather, we tend to see the common problem of the “tail wagging the dog”, whereby instead of traders backtesting strategies that fit with their real-life usage, they model strategies according to the limitations of their chosen platform.
A workaround exists: the trader can place a pre-emptive stop order speculatively on the same bar as the entry, before knowing whether the entry will fill. But this workaround is itself problematic. Even with this approach, it is not possible to place a stop on the same bar as the entry with knowledge of the actual fill price. The stop level cannot be dynamically computed from the fill (for example, a fixed dollar amount or volatility multiple below the actual execution price, adjusted for any slippage that occurred) because the fill has not yet happened when the stop must be specified. The trader must either use a pre-determined price level that may not reflect the actual entry or accept the delay.
This is more than a theoretical nuisance. In live trading, the trader would almost certainly place a protective stop immediately upon receiving a fill confirmation, with the stop level computed from the actual execution price. The platform’s inability to simulate this behaviour means the backtest cannot model the way the trader would actually operate in practice. The resulting simulation diverges from live trading not because the researcher made a poor modelling choice, but because the platform makes the correct model architecturally impossible. The backtest shows positions surviving periods of unprotected exposure that the live trader would never have tolerated, producing exactly the kind of systematic positive bias described throughout this paper.
3.3 Transaction Cost Modelling
Commission costs are the most visible component of transaction expenses and, accordingly, are the component most frequently included in backtests. However, commissions are often the smallest portion of true execution costs. The bid-ask spread represents a cost incurred on every round-trip trade, and its magnitude varies considerably with market conditions, time of day, and instrument liquidity. A backtest that uses a fixed spread assumption averaged over the entire sample period systematically underestimates costs during periods of stress, the very periods when many strategies are most active.
Market impact (the price movement caused by the act of trading itself) is almost universally ignored in retail backtests. For small accounts, this may be defensible. For anything larger, it is not. Any strategy intended to scale, or trading instruments with limited depth of book, needs an impact model; without one, capacity estimates are meaningless.
Several additional categories of transaction friction are routinely omitted from retail backtests yet can dominate execution costs for common strategy types. For short equity strategies, borrow availability and borrow fees are material: shares that appear freely shortable in historical data may have been hard-to-borrow or unavailable during the periods the strategy would have needed them, and borrow rates can spike from basis points to annualised double digits during short squeezes or high-demand periods. Forced buy-ins, where the lender recalls shares, can close positions at at the worst possible time. For strategies trading adjusted-price equity series, the handling of dividends and corporate actions (splits, mergers, spin-offs) is a common source of error; the distinction between price-return and total-return series can significantly alter apparent performance, particularly for dividend-rich universes over long sample periods. For high-turnover strategies using limit orders on maker-taker exchanges, the net effect of exchange rebates and fees on profitability can be substantial: a rebate of a few tenths of a cent per share, compounded across thousands of round-trip trades per year, may represent the difference between a profitable strategy and an unprofitable one. Finally, for strategies that employ leveraged products, including contracts for difference (CFDs), perpetual swaps, and margin loans, the financing cost of maintaining positions is a continuous drag that must be modelled explicitly, particularly in regimes of elevated interest rates where overnight funding charges can erode returns that appear attractive on an unfinanced basis.
Crypto derivatives introduce a financing cost with no close analogue in traditional futures. The dominant crypto instrument is the perpetual swap, which has no expiry date and instead uses a periodic funding rate to tether its price to spot. This funding rate is endogenous to market positioning: it can swing violently when leverage is crowded, and during short squeezes it frequently spikes to annualised rates of several hundred percent. A crypto backtest that ignores funding, or treats it as a flat cost, can misstate strategy returns by margins that dwarf the strategy’s apparent edge. Funding in crypto is a variable tax on positioning whose magnitude depends on what everyone else is doing.
A related and widely ignored friction is the behaviour of margin requirements themselves during periods of extreme volatility. Exchanges do not maintain fixed margin rates; they raise them, sometimes sharply and with little warning, when market conditions deteriorate. During the onset of the COVID-19 pandemic in March 2020, for example, CME Group increased initial margin requirements on several major futures contracts by 30% to 50% or more within the space of days. Other exchanges and clearinghouses took similar action. For the trader who was fully deployed at pre-crisis margin levels, these increases had immediate and painful consequences: existing positions that had been within margin limits were suddenly in deficit, requiring either the deposit of additional capital or the forced liquidation of positions at the worst possible time. A backtest run over the same period will show none of this. Standard backtesting engines treat margin requirements as fixed throughout the simulation, if they model them at all. The strategy that the backtest shows holding calmly through the March 2020 sell-off may, in practice, have been forcibly liquidated by a margin call triggered not by a trading loss but by the exchange’s decision to raise collateral requirements in the middle of a crisis. The backtest records a position that survived and eventually recovered; the live trader’s position was closed at the lows.
This is not a one-off occurrence. Exchanges routinely raise margin during volatility spikes, geopolitical shocks, and periods of unusual market stress. The pattern is predictable in its general form (margins go up when volatility goes up) even if the specific timing and magnitude are not. Any backtest that holds margin requirements constant is implicitly assuming that the trader had unlimited additional capital to meet margin calls during precisely the periods when capital is hardest to raise and most painful to deploy. For leveraged strategies in particular, the gap between backtested and live performance during crisis periods often has more to do with margin mechanics than with the strategy’s signals.
A small discipline that catches a surprising fraction of bad strategies is to report gross and net returns side by side at every reporting stage of the development pipeline. The gap between the two is a direct measure of how much of the apparent edge is being eaten by friction. If the gross-to-net ratio is small (the strategy survives realistic costs comfortably) the strategy is genuinely robust to its execution assumptions. If the ratio is large, the backtest is one realistic cost assumption away from being unprofitable, and the rest of the analysis should be conducted with that in mind. Many retail backtests report only gross returns, or only nominally-net returns based on a commission assumption alone. The full delta from gross to fully-modelled net is usually larger than the trader expects, and the larger it is, the closer the strategy sits to the edge of viability.
3.4 Futures Rolling Costs
In futures markets, the costs of rolling positions at contract expiry deserve particular attention. They are frequently overlooked or underestimated in backtests. The roll involves closing a position in the expiring contract and simultaneously opening an equivalent position in the next contract month. This operation incurs several distinct costs.
First, there are the direct execution costs of two trades: commissions and exchange fees on both legs, plus the bid-ask spread. For strategies that maintain continuous exposure across multiple instruments, these costs are incurred regularly (monthly, quarterly, or at whatever interval the contract cycle dictates) and accumulate over the life of a backtest. Second, the spread between the expiring and next contract (the calendar spread) may be wider than typical bid-ask spreads in either contract individually, particularly in markets with pronounced contango or backwardation structures. The cost of crossing this spread is a real friction that does not appear in a continuous price series.
Third, and more subtly, the roll introduces basis risk. The price relationship between the expiring and next contract is not fixed; it moves during the roll window as supply and demand for each contract shift. A strategy that assumes rolling occurs at a single price ratio (as most continuous contract construction methods implicitly do) is ignoring the execution uncertainty inherent in the roll. In less liquid markets or during periods of stress, the realised roll cost may differ sharply from the theoretical cost implied by the continuous price adjustment.
Fourth, concentrated roll activity around standard dates (such as the days surrounding first notice day or contract expiry) can itself move the calendar spread, creating adverse market impact specifically around the roll event. For strategies with positions across many futures contracts, the aggregate drag from rolling costs over a multi-year backtest can be substantial, yet most retail backtesting frameworks treat the continuous price series as if it were a single instrument with no roll friction whatsoever.
3.5 Exchange Rate Risk
For any trader whose account is denominated in a currency different from the currency in which an instrument is quoted, exchange rate movements represent a constant and often entirely unmodelled source of profit-and-loss variation. Consider an Australian trader holding funds in USD to trade US futures, a sterling-based trader accessing yen-denominated contracts, or a euro-based portfolio with exposure to any non-European market. By way of example, over the last 25 years, the AUD-USD exchange rate has fluctuated between 0.50 and 1.10, which is clearly a significant source of variation in the strategy’s returns.
The most direct effect is on trade-level profit and loss. A futures trade that generates a profit of one thousand U.S. dollars has a different value to the Australian-dollar trader depending on the AUD/USD exchange rate at the time the profit is realised. If the Australian dollar has strengthened against the U.S. dollar between entry and exit, the profit converted back to the trader’s home currency is smaller than the nominal gain in dollar terms. If the Australian dollar has weakened, the converted profit is larger. Over a long backtest spanning periods of significant exchange rate movement, the cumulative effect of ignoring this conversion can alter both the magnitude and the trajectory of the equity curve. A strategy that appears to deliver steady returns in the instrument’s native currency may, when properly converted to the trader’s home currency, exhibit markedly different return and drawdown characteristics. The error is directional. Exchange rate trends can persist for years, meaning that the currency effect can systematically inflate or deflate apparent performance across extended portions of the backtest.
The second manifestation involves margin requirements and the cash held to support them. Futures positions require margin deposits, and for instruments traded on foreign exchanges or denominated in foreign currencies, these deposits are typically held in the instrument’s native currency. The value of this margin collateral, expressed in the trader’s home currency, fluctuates with the exchange rate. A backtest that tracks available capital and margin utilisation in purely nominal terms (ignoring the currency exposure inherent in foreign-denominated margin balances) will misstate the trader’s true buying power and risk of margin shortfall. During periods of adverse exchange rate movement, the effective margin buffer can shrink even if no trading losses have occurred, and this erosion is invisible to a backtest that does not model the currency overlay.
Third, and perhaps most subtly, cash balances held in a foreign currency earn interest at that currency’s prevailing rate, not the trader’s domestic rate. For strategies that maintain substantial uninvested cash, either as margin collateral or as a risk management buffer, the interest differential between the domestic and foreign currency can meaningfully affect long-term returns. In periods where there is a significant interest rate differential between currencies, the carry effect on idle cash compounds over time and can represent a material drag or tailwind that a currency-naive backtest will not capture. The effect is particularly pronounced for strategies that allocate only a small fraction of capital to active positions and hold the remainder as cash, which describes a large proportion of volatility-targeted and risk-parity approaches.
The aggregate effect of these three mechanisms (trade P&L conversion, margin collateral revaluation, and interest rate differentials on foreign cash) is that any backtest of a cross-currency strategy that does not incorporate an exchange rate model is reporting results in a currency that the trader does not actually hold. Every number is wrong: the equity curve, the drawdown statistics, the Sharpe ratio, all of it. For strategies that trade exclusively in the trader’s home currency, this issue does not arise. But for the many retail traders who access global futures markets from a non-US-dollar-denominated account (and this includes a large proportion of traders based in Australia, Europe, the United Kingdom, and Asia) the omission of exchange rate effects is a structural error that pervades every calculation the backtest produces, not a minor refinement to be addressed later. Matters are worse still because some widely used platforms provide no mechanism for currency conversion at all: all profits and losses are reported in the instrument’s native currency, with no facility for translating results into the trader’s actual account currency. On such platforms, the exchange rate failures described above are not optional omissions by the researcher. They are structural impossibilities imposed by the tool.2
A subtler error is to model the currency overlay as a single end-of-sample conversion: take the final P&L in the instrument’s native currency, multiply by the exchange rate on the day the backtest ends, and report the result. This is the cleanest implementation choice, and the implicit one in many tutorials. It is also wrong. Every trade settles in the instrument’s currency at the time it is taken, and the value of that cash flow in the trader’s home currency depends on the rate prevailing at that moment. Every open position carries an embedded FX exposure that varies with the home-currency mark-to-market of the instrument. Every cash balance accrues interest at its native currency’s rate. Collapsing all of this into a single rate at the end of the sample destroys the path dependency of the FX overlay, and the resulting equity curve can differ from the dynamic version by more than the strategy’s apparent alpha over multi-year periods of significant currency drift.
The mechanical fix is simple in principle. Run an FX series alongside the price series, convert each trade-level P&L, margin balance, and interest accrual at the prevailing rate, and report performance in the trader’s actual currency throughout. The fix is rarely implemented in practice because the average retail platform makes it inconvenient, and because the trader has often never been prompted to think about it.
3.6 Counterparty Risk
A backtest treats the broker, the exchange, and the clearinghouse as silent infrastructure. They settle in the background, they remain solvent, and they keep their part of the bargain. Live trading does not always cooperate.
In regulated futures and equities, client assets are typically segregated from broker working capital, and clearinghouse failure is a remote contingency. Remote, but not impossible. MF Global collapsed in October 2011 with a roughly $1.6 billion shortfall in segregated customer accounts. Refco failed in 2005, two months after its IPO, and customer assets that should have been ringfenced turned out to be entangled with the parent’s collapse. The Lehman Brothers bankruptcy in September 2008 left the firm’s prime brokerage clients (including a large number of hedge funds) unable to access positions and balances for an extended period. These events are rare. They have happened before. A backtest that quietly assumes the broker is always there is making a probabilistic assumption it has never stated explicitly.
Crypto markets dispense with the rare-tail-event framing altogether. The exchange, broker, and custodian are typically the same entity, client assets may be commingled with the firm’s own balance sheet, and rehypothecation of customer collateral has been documented at multiple major venues. FTX, Mt. Gox, QuadrigaCX, Celsius, Voyager, BlockFi: the list of crypto venues that have either failed outright or frozen withdrawals during the past decade is too long to credibly characterise as a series of one-off accidents. Any crypto backtest spanning more than a few years implicitly assumes that the chosen exchange survived, that assets were never frozen, and that withdrawals were possible at the times the backtest needed them. Those assumptions are demonstrably false for a substantial fraction of the venues that have ever existed.
Stablecoin exposure sits underneath all of this as an additional credit overlay. Strategies denominated in USDT or USDC implicitly assume the peg holds, yet USDT traded as low as 0.88 in 2018, USDC depegged during the Silicon Valley Bank episode in 2023, and several smaller stablecoins (UST being the most notorious) have collapsed entirely. The peg is a credit-and-reserve assumption dressed up as a currency. A backtest that quotes returns “in USDT” is quietly assuming the issuer’s reserves are good throughout the sample period.
The honest treatment of counterparty risk is uncomfortable. No clean simulation captures the discontinuity of an exchange suddenly closing its doors. A trader can apply a probability-weighted haircut to long-horizon returns to reflect the historical base rate of venue failure. They can restrict the strategy universe to venues with adequate safeguards (segregated accounts, audited reserves, genuine regulatory oversight). They can also flag the residual exposure as an explicit caveat on the reported performance, rather than pretending it isn’t there. Most retail crypto backtests do none of these. They report returns as though every venue in the sample is permanent and solvent. Both assumptions have failed within the lifetime of most active traders.
3.7 Margin Fluctuations
Return on capital is one of the most consequential measures of a trading strategy, and the capital required to deploy a position in any leveraged instrument is set by the broker’s margin requirements. Those requirements move. They move in two different ways at once, and both effects compound.
The first is the steady drift in the notional value of a single contract as the price of the underlying changes. The E-mini S&P 500’s percentage margin may sit at a stable fraction of notional for years, but the dollar amount required to hold one contract has grown by roughly 5x over the past two decades because the index itself has appreciated by that much. A strategy that backtests with a constant $5,000 margin assumption is silently sliding between very different leverage regimes across the sample period.
The second is the sharp, discretionary increase in percentage margin imposed by exchanges and clearinghouses during periods of stress. CME, ICE, LME, and Eurex all reserve the right to raise margins on short notice, and they exercise that right. In the GFC of 2008, in March 2020 at the onset of COVID, in February 2022 after the Russian invasion of Ukraine, and at numerous smaller flashpoints between, initial margin on flagship contracts has doubled or tripled within days. The trader who was fully deployed at pre-crisis margin levels suddenly found themselves in deficit having taken no trading loss whatsoever. The options were two: deposit additional capital on a timetable dictated by the broker, or liquidate at exactly the wrong moment.
Multiply the two effects together. If the contract’s notional value doubles over the sample period and the exchange doubles the percentage margin during a crisis, the dollar capital required to hold one contract at that moment is four times what the naive backtest assumed. Position sizing rules calibrated against the average historical margin will be wrong by a factor approaching that during precisely the periods when being wrong is most painful.
Reconstructing the historical margin schedule is also surprisingly difficult. Exchanges publish current margin requirements but seldom maintain a clean public archive of historical changes. Data vendors typically do not carry this series at all. Trying to back out the actual schedule from press releases and SPAN files is painful, and the result is incomplete even after substantial effort. Most retail backtesters never attempt it.
The practical choice is between three uncomfortable options. A single fixed margin assumption for the entire backtest is tractable but obviously wrong, and most commonly overstates available leverage during the periods that matter most. Modelling margin as a fixed percentage of contemporaneous notional captures the slow drift but misses the discontinuous jumps that cause forced liquidations. Reconstructing the actual schedule produces a defensible result at the cost of substantial work. None of these is ideal.
The LME nickel episode in March 2022 is worth flagging as a worst case. The London Metal Exchange suspended trading and retroactively cancelled trades after a short squeeze drove the price from around 100,000 per tonne in a matter of hours. Positions on either side of those cancelled trades were extinguished by exchange fiat. No backtest captures this kind of event, because no backtest contains the concept of trades being unwound by the venue after the fact. The trader who held the surviving leg of one of those positions discovered that an exchange’s unilateral right to act on a contract is a risk dimension no margin model contains.
Margin assumptions matter. They can convert a strategy that the backtest shows surviving a crisis into one that, in live trading, would have been forcibly liquidated at the lows. The backtest records a position that recovered; the live trader’s position was closed at zero. These are different equity curves describing nominally the same strategy.
3.8 Limit Halts and Continuous Trading Assumptions
Most retail backtests assume the market was continuously tradable for the duration of the sample period. They assume this implicitly, by simulating fills at the printed close price or the next open without checking whether the venue was actually accepting orders at that moment. The assumption is wrong more often than the backtest reveals.
Futures exchanges impose daily price limits on a long list of contracts. Soybeans, corn, wheat, lean hogs, live cattle, lumber, natural gas, several of the interest-rate complexes, and most energy contracts have published limit moves that, when reached, either halt trading entirely or expand to a wider band. When the market hits limit, no trades print at prices beyond the band, but the underlying value of the position has clearly moved. A backtest that uses the settlement price on a limit-locked day implicitly assumes the trader could have exited at that price. They could not. They could only have exited if the market reopened within the band, which sometimes happens within the same session and sometimes takes days.
The ICE Brent and NYMEX WTI futures hit expanded daily limits multiple times during the spring of 2022 as the Russian invasion of Ukraine disrupted energy markets. The wheat complex hit limit-up several times in the same period for the same reason. A backtest of a trend-following or breakout strategy run across that window, assuming daily settlement fills, is silently assuming a trader could have exited at the printed settle on every one of those days, when in practice the position would have been carried to the next session with no opportunity to close.
Equity markets have analogous mechanisms. The SPX market-wide circuit breakers halt trading at -7%, -13%, and -20% intraday moves. The first two trigger 15-minute halts; the third closes the market for the day. Single-stock LULD (Limit Up Limit Down) bands halt individual securities for five minutes when prices move outside their volatility-derived bands. Stock-specific trading halts happen routinely for pending news, regulatory action, or extreme volatility. Each of these events represents a period in which the backtest’s assumed liquidity simply did not exist.
The LME nickel halt in March 2022 mentioned earlier is a different and more aggressive case: the venue halted trading and then cancelled completed trades retroactively. Most exchanges will not do this. The point is that they can, and that no backtest’s model of fills survives this kind of intervention.
Crypto exchanges are nominally always open, but the same effect plays out through liquidity collapse rather than formal halt. During the cascading liquidations of May 2021, the chain of failures in mid-2022, and the November 2022 FTX collapse, order book depth on major venues evaporated to a fraction of normal levels for periods of hours. The backtest that prices fills at the printed last trade on those days is assuming the trader was the only person who wanted to transact at those prices. They were not, and the spread the trader actually faced was orders of magnitude worse than the printed mid.
The practical guidance is to flag and special-case any bar where price moved by more than the contract’s published limit, or where the session was halted. Vendors typically do not annotate these events in their data, so the trader has to maintain their own list of known limit-locked days and halt sessions. Strategies that assume the trader could have exited at any printed price are systematically overstating achievable performance, and the overstatement is concentrated in exactly the events the strategy is meant to profit from.
3.9 Tax and Holding-Period Effects
Backtests universally report pre-tax returns. Live results are post-tax. The gap between the two is large, and the size of the gap depends on choices the strategy designer rarely thinks about during development.
Most jurisdictions tax short-term and long-term gains at different rates, and the boundary between the two is defined by the holding period. In the United States, short-term gains on positions held less than a year are taxed as ordinary income, at marginal rates that can exceed 40% once federal and state taxes are combined; long-term gains receive preferential rates of 15% or 20% for most filers. In the United Kingdom, capital gains tax applies a separate schedule from income tax, with allowances and rates that depend on the taxpayer’s overall bracket. In Australia, capital gains on positions held for more than 12 months receive a 50% discount, effectively halving the marginal tax rate on long-term gains. Other jurisdictions have their own variants, but the pattern is universal: short holding periods are penalised, long holding periods are rewarded.
The implication for backtesting is direct. A strategy with an average holding period of three days, generating a pre-tax return of 12% per year, may yield around 7% after tax for a US investor in a high marginal bracket. A buy-and-hold strategy with the same pre-tax return yields close to 10% after tax. The two strategies look identical on the backtest equity curve. They are not, in any economically meaningful sense, identical. The ranking between strategies of different turnover can flip entirely once the tax overlay is applied.
Wash sale rules add a further layer of complication. In the United States, losses realised on a security cannot be deducted if a substantially identical position is re-established within 30 days. For high-turnover strategies that frequently rotate in and out of the same instruments, the wash sale rule can disallow a meaningful fraction of realised losses, distorting the after-tax return relative to a naive calculation.3 Other jurisdictions have their own anti-avoidance rules that achieve similar effects.
The honest approach is to report pre-tax and post-tax returns side by side, with the trader’s actual marginal rates plugged in, and to compare strategies on the after-tax basis when ranking. Tax-loss harvesting, the timing of realisations across tax years, and account-type effects (retirement accounts versus taxable accounts) all change the picture in ways the backtest never sees. None of this appears in standard backtest output. The trader who selects a high-turnover strategy on the basis of pre-tax returns may be selecting a strategy that, after tax, underperforms a passive benchmark.
TradeStation deals with this by allowing strategies to stipulate that they should be evaluated every tick (real or guessed) but this introduces logic differences in the strategy code that inadvertently lead to other problems. In practice, this is not something most TradeStation developers seem to consider. ↩︎
Since a lot of trading tools have historically come from US-based companies, and platforms such as TradeStation naturally favour US-based markets, many US-based traders have never had to think about these issues! ↩︎
The wash sale rule applies to securities but not to futures or section 1256 contracts, which receive their own preferential treatment in the US. Strategy designers should know which regime their instruments fall under, and most do not. ↩︎