Do You Even Backtest?
I backtest - a lot. And with all my strategies fully automated, most of my time goes into researching, testing, and refining ideas rather than managing trades day to day.
It’s fun, it’s exciting, and most importantly, it helps me make sense of what I’m trying to do.
But looking back, there are a few critical questions - along with practical ways to deal with them - that I wish I had kept front and center before starting my backtesting journey.
1. Do the results translate to real, live execution?
For a backtest to be useful, we need reasonable confidence that live results will resemble the backtest - on average.
That means similar entries, structures, timing, exits, and slippage. Not a perfect match - that’s unrealistic - but close enough to trust the conclusions.
If live results consistently deviate from the backtest, there’s little value in using historical data to make meaningful predictions about long-term performance.
Approaches
1. Use a proper backtester
Garbage in, garbage out. If you’re trading frequently and precisely, you need high-resolution data, realistic execution logic, and enough control to simulate real conditions. Otherwise, you’re probably better off DCA-ing the S&P 500 (no shame in that).
I use Option Omega for both backtesting and automation. I previously used TAT (Trade Automation Toolbox), and while it got the job done, it was cumbersome - not being a SaaS solution and supporting only Windows meant maintaining a dedicated Windows host. That wasn’t very convenient and even forced me to get creative when accessing the UI from my phone.
More importantly, execution logic is much easier to align when automation is generated directly from a backtest. I no longer need workarounds or “tricks” to support trade structures that aren’t formally handled, which makes live execution far closer to what the backtest actually models.
To be fair, there’s also BYOB, the backtester created by the same developer behind TAT. It’s free to use (unlike Option Omega) and offers tick-level data, which is a big advantage. However, in my experience, it’s much more limited and strategy-specific. While it has its strengths, it also inherits some of the constraints of the TAT ecosystem - which may be perfectly fine for traders already comfortable operating within it.
2. Treat backtesting as a skill
The tool matters - but so does knowing how to use it. That includes:
- Running conservative tests with realistic slippage and commission assumptions(especially distinguishing entry vs. exit slippage, which is critical for premium-selling strategies)
- Using the correct data resolution and configuration(e.g., intra-minute stops, capped losses or profits, double-tap exits - when not trading “ride or die” to expiration)
- Understanding sample sizes - when to go fixed vs. scaled, and why
- Understanding how scaling affects CAGR, drawdowns, and MAR
- Avoiding overfitting
3. Continuously reconcile live vs. backtest
If live results miss the mark, don’t panic - investigate.
Maybe your slippage assumptions are off. Maybe your automation isn’t configured exactly like the backtest.
Live trading actually makes us better backtesters - because we can feed reality back into the model and get closer to the truth.
2. Am I overfitting / curve-fitting / over-optimizing?
Overfitting happens when we optimize too hard on historical data and end up with a beautiful equity curve that collapses the moment it meets new data.
This usually comes from too many rules, filters, and conditions, which shrink the data sample, fit the model to historical noise and tail events, and destroy statistical power.
Our goal isn’t perfection - it’s generalization.
Approaches
1. Inspect tail dependence
Look at the top and bottom 1–5% of absolute trade outcomes.
Are they wildly larger than the average trade? If so, your results may rely on rare events rather than repeatable behavior.
2. Reduce rules aggressively
Fewer rules → fewer ways to fool yourself.
Instead of chasing the best CAGR or MAR with ten filters, try to get good enough results with two or three core rules.
You’ll give up some performance - but gain robustness and confidence.
(There is a balance here - guardrails still matter.)
3. Is my data sample large enough to prove an edge?
Every trader eventually has to ask:
Is this a real edge - or just noise that looks convincing?
Small samples lie. Randomness is powerful.
Approaches
1. Run power analysis
A power analysis is a statistical tool used to estimate whether a dataset is large enough to detect a real effect with statistical significance, given the expected effect size and variability. In simple terms, it helps answer: “Do I have enough data to trust this result?”
For trading strategies, power analysis is mainly used to:
- Assess sample size adequacy: Determine whether the number of trades is sufficient to support statistically meaningful conclusions.
- Interpret significance correctly: Distinguish between a genuine edge and results that are likely driven by randomness.
- Prevent overconfidence: Reveal when backtest results look strong but are based on too little data to be reliable.
In trading, where edges are often small and noisy, power analysis helps frame results in terms of evidence strength rather than raw performance.
NOTE: Classical power analysis methods often assume normally distributed data, an assumption that rarely holds for trading P&L. Because returns are typically skewed and heavy-tailed, resampling approaches such as bootstrapping are often more appropriate, allowing significance and power to be evaluated without relying on strict distributional assumptions.
4. How did the strategy behave across different market regimes?
The longer the backtest, the more environments you observe - and the better you understand worst-case scenarios.
(Always assume the future can be worse.)
Approaches
1. Use long enough periods - but stay relevant
For 0DTE SPX strategies, it makes sense to start from mid-2022, when daily expirations were introduced.
Larger samples can compensate for shorter histories, but very old data can mislead. Balance matters.
2. Consider walk-forward methods (WFA / WFO)
These methods repeatedly optimize in-sample and validate out-of-sample using rolling windows.
While trickier to implement for options traders, they help balance adaptability with robustness and reduce regime overfitting.
5. How does this strategy fit into my portfolio?
A strategy can look amazing on its own - and still be useless (or dangerous) in a portfolio.
Correlation matters more than standalone performance.
Approaches
- Run portfolio-level backtests with realistic sizing
- Assess total portfolio risk, not individual strategy returns
If a strategy pushes portfolio drawdowns beyond tolerance, either reject it - or, if it’s clearly superior to a highly correlated strategy, replace one with the other. Another option is to run both at reduced size so they don’t amplify the same risk.
Always remember: historical drawdown is a lower bound. Leave room for worse.
6. Does the strategy survive small parameter changes?
One of the best overfitting detectors is fragility.
If small tweaks completely destroy performance, you probably optimized noise.
Approaches
- Run strategy variants
- Change one parameter at a time: entry time, delta, technical filters’ values, exit conditions
- Look for graceful degradation or improvement, not collapse
You’re not looking for magic numbers - you’re looking for stable regions.
Final thought
Backtesting isn’t about finding the perfect system.
It’s about reducing uncertainty, understanding risk, and setting expectations you can actually live with.
It’s a powerful tool - but it’s not magic. It has limitations, and its value depends entirely on how thoughtfully and realistically it’s used.
The goal isn’t to be right in hindsight -
it’s to build systems that survive long enough and generalize well.
Good luck - and may G-d guide our expectancy.