Title: Can an ML System Beat the Stock Market? Here’s What Actually Worked.

Keshav Khanna5 min read·Just now

Every investor faces the same question — can you consistently earn more than the market average? The S&P 500 returns about 10% per year. Most professional fund managers fail to beat it after fees. I wanted to know if machine learning could do what most professionals cannot.

I built a system that predicts hourly stock returns across a universe of large-cap US equities and trades them automatically. The result on data the model never saw during training: +22.4% return versus +4.6% for simply holding the same stocks. But the path to that result was full of failures that taught me more than the successes.

The Setup

I started with Google — one of the most liquid stocks in the world with strong analyst coverage and clean historical data. Two years of hourly price bars. 64 engineered features per bar. The prediction target: will this stock be higher or lower 24 hours from now?

That 24-hour target was not chosen arbitrarily. I tested every possible horizon from 1 hour to 8 hours across 31 different signals. 14 of 31 signals peaked in predictive power at 24 hours. The data voted for the target.

Press enter or click to view image in full size

Building Layer by Layer

Rather than throwing everything into one model, I built in layers — adding one category of information at a time and measuring whether it actually helped.

Layer 1 used 31 technical signals computed purely from price and volume — RSI, MACD, momentum, volatility, institutional flow. RandomForest achieved IC +0.234 and correctly predicted direction 58.7% of the time on data it never saw. Real edge from price alone.

Layer 2 added static fundamental data — P/E ratio, return on equity, debt to equity. The result was zero improvement. Not one fundamental variable appeared in the top 15 features. The reason is simple but important: a P/E ratio of 32.3 is the same value at every single hourly bar for an entire quarter. The model cannot learn from a feature that never changes.

Layer 3 added macro signals — VIX fear gauge, dollar index momentum, SPY market trend. IC improved to +0.186. The model learned that knowing what the overall market is doing tells you more about Google’s next 24 hours than Google’s own recent price patterns.

Layer 4 was the breakthrough. I took the same analyst data that failed in Layer 2 and encoded it differently. Instead of a static target price, I computed analyst upside live — target price minus current close divided by current close — recalculated every hour as price moves. When Google trades at $350 and analysts have a $378 target, the upside is 8%. An hour later at $348 it becomes 8.6%. Same data. Different encoding. It became the single most important feature at 19.4% importance.

The lesson generalizes beyond finance — the question is not what data you have. It is how you encode it.

The Deep Learning Attempt

Layer 5 was a Transformer — the same architecture behind ChatGPT, applied to sequences of hourly price bars. The motivation was sound: stock prices are sequential, and Transformers learn which past moments are most relevant through attention. Theoretically the right tool.

It failed. The single-stock Transformer had 20,914 parameters but only 2,381 training sequences — a ratio of 0.11 when you need at least 5 to 10. The model did not have enough data to learn reliably. IC was positive at +0.127 but hit rate was 50.3% — essentially random at predicting direction. For trading, direction is what matters.

I then scaled to a multi-stock Transformer trained on 71 stocks simultaneously — 188,000 sequences, enough data. It learned genuinely, reaching validation IC +0.056. But on the test period it produced IC -0.037. The model had trained on a bull market and was tested on a period that included a bear correction. Momentum continuation patterns — which work beautifully in bull markets — reverse completely in corrections.

This is the central lesson of the entire project. IC — the statistical measure of predictive quality — was positive in training. The backtest lost 94% of capital. Positive IC does not guarantee positive returns when the market enters a regime the model has never seen.

The Fix

The solution came from how Jim Simons built the most successful trading fund in history. He did not try to build one model that works on everything. He found instruments where statistical patterns persist and traded only those.

I ran the same GradientBoosting pipeline independently on every stock in our universe. Each stock got its own model. Then one filter — keep only the stocks where the model shows a real edge on out-of-sample data. 18 stocks passed. 51 were rejected.

The rejected stocks were not failures — they were information. Energy stocks consistently failed because they are driven by oil price, which is invisible to our signals. Materials stocks consistently succeeded because they have idiosyncratic patterns the model can learn. The model correctly identified the boundaries of its own predictive ability.

Press enter or click to view image in full size

The Results

The 18-stock portfolio on seven months of data the model never saw: total return +22.4% versus +4.6% for buy-and-hold. Maximum drawdown -3.9% versus -12.6%.

The drawdown comparison is the most important number. During the Q1 2026 market correction the passive portfolio fell 12.6%. The ML portfolio fell only 3.9%. The model was never told a correction was coming. VIX and dollar momentum signals reduced exposure automatically — regime awareness working exactly as designed.

I tested both RandomForest and GradientBoosting on the full stock universe. GradBoost found 18 edge stocks versus 15 for RandomForest, with higher Sharpe and lower drawdown. GradBoost’s sequential error correction is better at detecting the three-way interaction between analyst upside, macro conditions, and volatility that drives the strongest predictions.

What I Learned

Four failures taught me more than the successes. Static fundamentals had zero variance and zero improvement — encoding matters more than data source. The single-stock Transformer had insufficient data — deep learning needs 5 to 10 times as many training samples as parameters. The multi-stock Transformer suffered regime mismatch — statistical edge within one market regime does not transfer to another. The cross-sectional hourly strategy had signal too weak to overcome noise.

The central takeaway in one sentence: IC is a necessary condition for profit, but it is not sufficient. Statistical signal quality and real-world profitability are two different problems. This project solved the first. The per-stock selection approach partially solved the second.

Next steps include live paper trading validation through Alpaca and a multi-agent AI framework where specialist agents can override the statistical model when macro conditions shift.

Title: Can an ML System Beat the Stock Market? Here’s What Actually Worked.

Title: Can an ML System Beat the Stock Market? Here’s What Actually Worked.

NexaPay — Accept Card Payments, Receive Crypto

Related Articles

Nvidia stock hits all-time high as AI momentum boosts market cap prospects

Mistral AI Drops New Open-Source Model. The Internet Is Not Impressed, Except for One Thing

DeepMind CEO highlights China’s AI advancements challenging Google’s dominance

Elon Musk Says xAI Used OpenAI Models to Train Grok

OpenAI Rolls Out Advanced Account Security for ChatGPT Users

Amazon, Microsoft, Google, Meta plan $710B AI investment by 2026