By Omer Korach in methodology — 15 Jan 2026

Do You Even Backtest?

Art by Nitzan Korach

We wouldn’t trust a medical treatment without trials, or an aircraft without stress tests - yet in trading, promises often replace proof. Since we don’t have real, live market data for every new idea upfront, backtesting is the closest thing we have to evidence. When done properly, it helps us understand what to expect, quantify risk, and decide whether an idea has any real validity before risking capital. That’s why backtesting plays such a central role in my process. With all my strategies fully automated, I spend most of my time researching, testing, and refining ideas rather than managing trades day to day.

But looking back, there are a few critical questions - along with practical ways to deal with them - that I wish I had kept front and center before starting my backtesting journey.

1. Do the results translate to real, live execution?

For a backtest to be useful, we need reasonable confidence that live results will resemble the backtest - on average.

That means similar entries, structures, timing, exits, and slippage. Not a perfect match - that’s unrealistic - but close enough to trust the conclusions.

If live results consistently deviate from the backtest, there’s little value in using historical data to make meaningful predictions about long-term performance.

Approaches

1. Use a proper backtester

Garbage in, garbage out. If you’re trading frequently and precisely, you need high-resolution data, realistic execution logic, and enough control to simulate real conditions. Otherwise, you’re probably better off DCA-ing the S&P 500 (no shame in that).

I use Option Omega for both backtesting and automation. I previously used TAT (Trade Automation Toolbox), and while it got the job done, it was cumbersome - not being a SaaS solution and supporting only Windows meant maintaining a dedicated Windows host. That wasn’t very convenient and even forced me to get creative when accessing the UI from my phone.

More importantly, execution logic is much easier to align when automation is generated directly from a backtest. I no longer need workarounds or “tricks” to support trade structures that aren’t formally handled, which makes live execution far closer to what the backtest actually models.

To be fair, there’s also BYOB, the backtester created by the same developer behind TAT. It’s free to use (unlike Option Omega) and offers tick-level data, which is a big advantage. However, in my experience, it’s much more limited and strategy-specific. While it has its strengths, it also inherits some of the constraints of the TAT ecosystem - which may be perfectly fine for traders already comfortable operating within it.

2. Treat backtesting as a skill

The tool matters - but so does knowing how to use it. That includes:

Running conservative tests with realistic slippage and commission assumptions(especially distinguishing entry vs. exit slippage, which is critical for premium-selling strategies)
Using the correct data resolution and configuration(e.g., intra-minute stops, capped losses or profits, double-tap exits - when not trading “ride or die” to expiration)
Understanding sample sizes - when to go fixed vs. scaled, and why
Understanding how scaling affects CAGR, drawdowns, and MAR
Avoiding overfitting

3. Continuously reconcile live vs. backtest

If live results miss the mark, don’t panic - investigate.

Maybe your slippage assumptions are off. Maybe your automation isn’t configured exactly like the backtest.

Live trading actually makes us better backtesters - because we can feed reality back into the model and get closer to the truth.

2. Am I overfitting / curve-fitting / over-optimizing?

Overfitting happens when we optimize too hard on historical data and end up with a beautiful equity curve that collapses the moment it meets new data.

This usually comes from too many rules, filters, and conditions, which shrink the data sample, fit the model to historical noise and tail events, and destroy statistical power.

Our goal isn’t perfection - it’s generalization.

Approaches

1. Inspect tail dependence

Look at the top and bottom 1–5% of absolute trade outcomes.

Are they wildly larger than the average trade? If so, your results may rely on rare events rather than repeatable behavior.

2. Reduce rules aggressively

Fewer rules → fewer ways to fool yourself.

Instead of chasing the best CAGR or MAR with ten filters, try to get good enough results with two or three core rules.

You’ll give up some performance - but gain robustness and confidence.

(There is a balance here - guardrails still matter.)

3. Is my data sample large enough to prove an edge?

Every trader eventually has to ask:

Is this a real edge - or just noise that looks convincing?

Small samples lie. Randomness is powerful.

Approaches

1. Run power analysis

A power analysis is a statistical tool used to estimate whether a dataset is large enough to detect a real effect with statistical significance, given the expected effect size and variability. In simple terms, it helps answer: “Do I have enough data to trust this result?”

For trading strategies, power analysis is mainly used to:

Assess sample size adequacy: Determine whether the number of trades is sufficient to support statistically meaningful conclusions.
Interpret significance correctly: Distinguish between a genuine edge and results that are likely driven by randomness.
Prevent overconfidence: Reveal when backtest results look strong but are based on too little data to be reliable.

In trading, where edges are often small and noisy, power analysis helps frame results in terms of evidence strength rather than raw performance.

NOTE: Classical power analysis methods often assume normally distributed data, an assumption that rarely holds for trading P&L. Because returns are typically skewed and heavy-tailed, resampling approaches such as bootstrapping are often more appropriate, allowing significance and power to be evaluated without relying on strict distributional assumptions.

4. How did the strategy behave across different market regimes?

The longer the backtest, the more environments you observe - and the better you understand worst-case scenarios.

(Always assume the future can be worse.)

Approaches

1. Use long enough periods - but stay relevant

For 0DTE SPX strategies, it makes sense to start from mid-2022, when daily expirations were introduced.

Larger samples can compensate for shorter histories, but very old data can mislead. Balance matters.

2. Consider walk-forward methods (WFA / WFO)

These methods repeatedly optimize in-sample and validate out-of-sample using rolling windows.

While trickier to implement for options traders, they help balance adaptability with robustness and reduce regime overfitting.

5. How does this strategy fit into my portfolio?

A strategy can look amazing on its own - and still be useless (or dangerous) in a portfolio.

Correlation matters more than standalone performance.

Approaches

Run portfolio-level backtests with realistic sizing
Assess total portfolio risk, not individual strategy returns

If a strategy pushes portfolio drawdowns beyond tolerance, either reject it - or, if it’s clearly superior to a highly correlated strategy, replace one with the other. Another option is to run both at reduced size so they don’t amplify the same risk.

Always remember: historical drawdown is a lower bound. Leave room for worse.

6. Does the strategy survive small parameter changes?

One of the best overfitting detectors is fragility.

If small tweaks completely destroy performance, you probably optimized noise.

Approaches

Run strategy variants
Change one parameter at a time: entry time, delta, technical filters’ values, exit conditions
Look for graceful degradation or improvement, not collapse

You’re not looking for magic numbers - you’re looking for stable regions.

Final thought

Backtesting isn’t about finding the perfect system.

It’s about reducing uncertainty, understanding risk, and setting expectations you can actually live with.

It’s a powerful tool - but it’s not magic. It has limitations, and its value depends entirely on how thoughtfully and realistically it’s used.

The goal isn’t to be right in hindsight -

it’s to build systems that survive long enough and generalize well.

Good luck - and may G-d guide our expectancy.

Do You Even Backtest?

1. Do the results translate to real, live execution?

Approaches

1. Use a proper backtester

2. Treat backtesting as a skill

3. Continuously reconcile live vs. backtest

2. Am I overfitting / curve-fitting / over-optimizing?

Approaches

1. Inspect tail dependence

2. Reduce rules aggressively

3. Is my data sample large enough to prove an edge?

Approaches

1. Run power analysis

4. How did the strategy behave across different market regimes?

Approaches

1. Use long enough periods - but stay relevant

2. Consider walk-forward methods (WFA / WFO)

5. How does this strategy fit into my portfolio?

Approaches

6. Does the strategy survive small parameter changes?

Approaches

Final thought

The Importance of The Curve Path

Not Wired For Trading

1. Do the results translate to real, live execution?

Approaches

1. Use a proper backtester

2. Treat backtesting as a skill

3. Continuously reconcile live vs. backtest

2. Am I overfitting / curve-fitting / over-optimizing?

Approaches

1. Inspect tail dependence

2. Reduce rules aggressively

3. Is my data sample large enough to prove an edge?

Approaches

1. Run power analysis

4. How did the strategy behave across different market regimes?

Approaches

1. Use long enough periods - but stay relevant

2. Consider walk-forward methods (WFA / WFO)

5. How does this strategy fit into my portfolio?

Approaches

6. Does the strategy survive small parameter changes?

Approaches

Final thought

The Importance of The Curve Path

Not Wired For Trading

You might also like...