Backtest Results

Key Takeaways

This strategy is designed to hunt whales: positive-return breakout trades that do not happen every day, but that are worth waiting for because they offer higher-confidence payoff when price, volatility, and options-implied volatility all line up. In other words, the goal is not to trade constantly. The goal is to stay patient, avoid noisy breakouts, and size up only when the evidence suggests the move is real.

That is also why the trade count is relatively low. A single whale-hunting strategy may only produce a handful of strong signals in one market, but the concept scales. In practice, many related strategies could be deployed across diverse markets, sectors, or futures so that each one contributes a few high-confidence trades per year, and together they create a steadier stream of opportunities.

The most important test in this report is the fixed live holdout from 2025-10-01 through 2026-05-01. The model was trained only on labeled breakouts before 2025-10-01, then applied to the later data as if it were brand new.

The baseline breakout strategy remained profitable in the live holdout, with 5 trades and a 1.66% total return.
The final ML position-scaling overlay improved holdout Sharpe from 1.53 to 1.73 and total return from 1.66% to 1.85%.
The hard filter produced a high Sharpe but only 2 trade, so it is less reliable as a final strategy choice.
The evidence still supports using the model conservatively as a position-sizing overlay rather than as a strict take-or-skip gate.

Key Numbers

Baseline Holdout Sharpe

1.53

ML Scaling Sharpe

1.73

Baseline Holdout Return

1.66%

ML Scaling Return

1.85%

Training vs Out-of-Sample Metric Summary

period	strategy	trade_count	sharpe_ratio	total_return	win_rate	max_drawdown	expected_return_per_trade	removed_signal_count
Training / in-sample backtest before Oct 2025	baseline	44	0.99	9.13%	65.91%	-2.30%	0.52%
Fixed live holdout: 2025-10-01 to 2026-05-01	baseline	5	1.53	1.66%	100.00%	-0.51%	1.12%	0
Fixed live holdout: 2025-10-01 to 2026-05-01	ml_hard_filter	2	2.52	1.25%	100.00%	-0.11%	2.31%	6
Fixed live holdout: 2025-10-01 to 2026-05-01	ml_position_scaling	5	1.73	1.85%	100.00%	-0.48%	1.12%	0

Fixed Live Holdout Setup

This section is intentionally separated from the earlier backtest. The model is fit on historical labeled breakout events before October 2025. It then scores breakout candidates from October 2025 through the latest saved data using only information available at each signal date. No labels or future outcomes from the holdout period are used to train the model.

The holdout answers the practical question: if the model had been frozen before October 2025, would it still have helped the breakout strategy in the next several months?

Out-Of-Sample Performance

The holdout result is useful because it is not just another optimized backtest. The baseline still made money, and the ML position-scaling overlay improved the risk-adjusted profile without removing the full breakout opportunity set. The hard filter had the cleanest-looking Sharpe, but it only took one trade, which makes it too fragile as the final design.

Fixed Live Holdout Strategy Comparison

strategy	trade_count	removed_signal_count	sharpe_ratio	total_return	win_rate	max_drawdown	expected_return_per_trade
baseline	5	0	1.53	1.66%	100.00%	-0.51%	1.12%
ml_hard_filter	2	6	2.52	1.25%	100.00%	-0.11%	2.31%
ml_position_scaling	5	0	1.73	1.85%	100.00%	-0.48%	1.12%

Holdout Prediction Quality

Holdout Breakout Predictions

signal_date	predicted_probability	predicted_label	label	label_event
2025-10-02 00:00:00	0.0814	0	0	stop_first
2025-10-08 00:00:00	0.0792	0	0	stop_first
2025-10-24 00:00:00	0.2095	0	0	stop_first
2025-10-27 00:00:00	0.1114	0	0	stop_first
2026-04-13 00:00:00	0.5992	1	1	target_first
2026-04-16 00:00:00	0.2968	0	1	target_first
2026-04-22 00:00:00	0.2199	0	1	target_first
2026-04-30 00:00:00	0.7120	1	0	timeout

Full Backtest Context

The fixed holdout is the main forward-style test, but the full backtest is still useful context. Across the full saved sample, the baseline strategy remains a transparent long-only QQQ channel breakout system with explicit volatility and risk-control rules: a 20-day breakout entry, a 20-day trend filter, a 0.20 ATR breakout-strength filter, a 2.0 ATR stop-loss, a 3.0 ATR trailing stop, a 15-day time stop, and a delayed breakout-failure exit.

The chart below shows the training and in-sample backtest regime before the fixed live holdout. This is where the breakout rules and ML labels were developed. The visual pattern is important: the strategy does not trade constantly. It waits for price to clear the channel, then exits through stop-loss, timeout, trailing-stop profit protection, or breakout-failure logic.

Full-Sample Baseline Metrics

Metric	Value
Sharpe Ratio	1.05
Expected Return Per Trade	0.58%
Average Trade Lifetime	5.8 days
Max Drawdown	-2.30%
Win Rate	69.39%
Trade Count	49
Total Return	10.94%

Full-Sample Baseline vs ML Overlays

strategy	trade_count	removed_signal_count	sharpe_ratio	total_return	win_rate	max_drawdown	expected_return_per_trade
baseline	12	0	0.6921	0.0168	0.8333	-0.0109	0.0055
ml_hard_filter	9	8	0.3760	0.0075	0.7778	-0.0124	0.0039
ml_position_scaling	12	0	0.7268	0.0188	0.8333	-0.0114	0.0055

Trade Outcome Analysis

Outcome Rates

Outcome	Trades	Rate
Successful	34	69.39%
Breakout failure	13	26.53%
Stop-loss triggered	2	4.08%

Payoff Shape

Metric	Value
Average winner	1.36%
Average loser	-1.17%
Payoff ratio	1.16
Largest winner	4.40%
Largest loser	-2.97%
Top 3 trade PnL share	48.14%

What The Final Result Means

The project now has two meaningful layers:

A tuned baseline breakout strategy that is already tradable and positive after costs.
A compact implied-volatility-aware ML overlay that improves the fixed live holdout modestly by changing position size on breakout candidates.

The improvement is not huge, and that is important to state honestly. This is not a black-box replacement for the breakout strategy. It is a small, interpretable enhancement on top of an already working baseline. The important result is that the overlay still helped when it was frozen before October 2025 and then evaluated on later data.

The strategy should also be read as whale hunting. It is not trying to win every breakout. It is trying to keep failed breakouts small, exit stale trades, and preserve enough exposure for the subset of breakouts that turn into real volatility expansions.

Limitations

The main limitation is that the fixed out-of-sample window is short. The October 2025-forward holdout is useful because it is a true forward-style test, but it still contains only a small number of completed trades. That means the holdout result should be treated as evidence that the model helped in this period, not as proof that the edge is permanent.

The strategy is also currently presented as a selected-asset QQQ showcase. The broader ETF screen supports that choice, but a production version should continue validating whether the same rules work across other liquid assets and whether QQQ remains the best place to express the breakout idea.

Finally, implied volatility is a helpful confirmation signal, not a perfect truth source. Options markets can misprice future movement, implied-volatility history can be noisy, and the model may need to be re-estimated if the relationship between price breakouts and options pricing changes.

What We Would Do Next

The next step would be to keep extending the out-of-sample paper-trading window as new data arrives. The most important question is whether the ML position-scaling overlay continues to improve the baseline after more unseen trades.

We would also test the same framework on a broader universe of liquid ETFs and futures, add richer options-chain features such as skew and term structure, and compare the current logistic-style model against tree-based classifiers. Any more complex model would still need to beat the current version on walk-forward performance, not just in-sample fit.

Before live deployment, we would paper-trade the strategy with the same execution assumptions used in the backtest, monitor slippage and fills, and pause or retrain the model if rolling expectancy, drawdown, or the baseline-versus-overlay comparison moves outside the expected backtest range.