When Classic ML Beats Deep Learning (And Vice Versa)

Most teams who think they need deep learning don't. They're building tabular prediction systems — predict which leads convert, predict which transactions are fraud, predict which customers will churn — and the dominant approach for those problems has been gradient-boosted trees for over a decade. The 2022 NeurIPS paper "Why do tree-based models still outperform deep learning on tabular data" laid out the evidence systematically: across multiple benchmarks, on tabular data, tree-based methods like XGBoost and LightGBM match or beat neural networks while training orders of magnitude faster and explaining their predictions in plain language. That conclusion hasn't aged out.
This article is the practical version of that finding. Where classic ML wins, where deep learning genuinely wins, and how to decide for any specific project. The decision is downstream of a single question: what shape is your data?
The category classic ML still owns
Tabular data. Rows and columns. Numbers, categories, dates, IDs. The data sitting in your data warehouse right now, the data your analysts have been querying for ten years.
Tabular data is the format of nearly every business prediction problem worth solving. Will this customer churn. Will this lead convert. Will this transaction be fraudulent. Will this loan default. What will demand for this SKU look like next week. Each of these is a row of features and a label.
Tree-based models — gradient-boosted trees especially, with libraries like XGBoost and LightGBM — have dominated tabular prediction for a decade. The dominance hasn't quietly ended in 2026. It hasn't even gotten close. Look at any Kaggle competition involving tabular data: gradient-boosted trees win or place near the top consistently, year after year, even as deep learning has eaten image, audio, and language benchmarks.
Three reasons trees still win on tabular data, all of them structural rather than incidental.
Mixed types and scales. Decision trees split on feature values directly — they don't care that one feature is "age in years," another is "income in dollars," and a third is "country code." Neural networks need extensive preprocessing — normalization, one-hot encoding, embedding layers — and a lot of training data to learn what trees handle natively.
Smaller dataset friendliness. Deep learning's advantage scales with data. On 50 million labeled examples, neural networks can find patterns trees can't. On 50,000 examples — which is what most business problems have — trees do as well or better, and they finish training before lunch.
Native interpretability. A trained gradient-boosted model can tell you exactly which features mattered for any specific prediction. SHAP values, feature importance plots, partial dependence — the toolkit is mature. A neural network is a black box you can probe but not fully explain. For regulated decisions (lending, hiring, fraud), the interpretability difference is enough on its own to settle the choice.
Where deep learning earns its place
Deep learning has earned its dominance on certain problem shapes. Three categories where it wins decisively.
Unstructured input. Images, audio, video, raw text. These don't fit cleanly into tabular cells. Deep learning is the only practical approach. Convolutional neural networks for images, transformers for language and increasingly for everything else.
Massive datasets where the long tail matters. When you have millions or billions of examples and rare patterns make a real difference (rare disease detection, rare fraud patterns at scale, recommendation systems for very large catalogs), deep learning finds structure that tree models miss.
Generative tasks. Writing, translating, summarizing, image generation, code generation. By definition you need a model that produces sequences. That means transformers — which are deep learning systems by construction.
If your input is unstructured, deep learning is probably right. If it isn't, classic ML is probably right. The mistake is letting fashion override the distinction.
A working test before you commit
Before committing to deep learning on a problem that might be tabular, run XGBoost first. It takes an afternoon. Use cross-validation. Tune a few hyperparameters. Note the baseline performance.
Now estimate the cost of building, training, and serving the deep learning version. GPU compute. Engineering time. Productionization. Monitoring. Versus deploying the gradient-boosted model, which probably runs on a single CPU and trains in minutes.
If the deep learning version doesn't beat XGBoost by enough to justify the cost — and on most tabular problems, it won't — ship the boring version. Use the difference in budget for something else. Better data. Better evaluation. A different problem.
The real reason teams overuse deep learning
Three reasons in roughly the order I see them.
Hiring. Teams hire deep learning specialists because that's what's on the recruiting brief. The specialists then bring deep learning to every problem because that's what they were hired to do. The right tool for a problem is whatever solves the problem. The right tool for a project budget is whatever the team can build.
Vendor incentives. AI vendors sell deep learning. They don't sell XGBoost. The marketing tilts toward neural networks because that's what gets enterprise deals signed. Your team reads the marketing and concludes that deep learning is the answer because the marketing is loud and the alternatives are quiet.
Newness bias. Teams assume that newer must be better. In some cases — generative tasks, vision, language — that bias is correct. In tabular prediction, it's mostly wrong. The newer thing isn't better; it's just newer.
The fix is to evaluate based on results, not on which technology is fashionable. Run the boring baseline. Compare honestly. Pick what wins.
Quick reference: which one for which problem
Problems where classic ML almost always wins:
- Customer churn prediction
- Lead scoring and conversion likelihood
- Credit scoring and default risk
- Fraud detection (often combined with rules)
- Demand forecasting
- Insurance claim risk
- Customer lifetime value prediction
- Marketing attribution
- A/B test analysis
Problems where deep learning earns its keep:
- Reading scanned documents into structured data
- Understanding open-ended customer messages
- Generating any kind of content
- Image classification and search
- Voice transcription and synthesis
- Translation and localization
- Code generation
- Personalization at very large user-base scale
The first list is the workhorses of any business. The second list is the headline-grabbers. Most enterprises spend most of their AI budget on the first list and most of their AI marketing on the second. That gap is fine. Don't let the marketing reorganize where the budget goes.
The bigger principle
Match the technique to the problem. Not to the buzz. Not to the hire. Not to the vendor pitch. The teams who run AI well in 2026 have a stable of techniques and pick whichever fits. Sometimes it's a frontier LLM. Sometimes it's a 10-year-old gradient-boosted tree. Sometimes it's a rules engine somebody wrote in 2014 that still works.
The boring tool that solves your problem is better than the fashionable one that doesn't. Especially when the boring tool is a tenth of the cost.
That's the pitch for classic ML in 2026. Use it where it wins. Use deep learning where it genuinely wins. Stop letting fashion or hiring decisions decide for you.
Frequently Asked Questions
Why do tree-based models still beat deep learning on tabular data?
Tabular data has features with different scales, mixed types, and complex interactions that decision trees handle natively. Neural networks need extensive preprocessing and a lot of training data to match what gradient-boosted trees do out of the box. The 2022 NeurIPS paper 'Why do tree-based models still outperform deep learning on tabular data' documents this systematically.
When does deep learning win on tabular data?
When you have very large datasets (millions of rows), heterogeneous data including text or images alongside structured features, or when you genuinely need the same model architecture used elsewhere in your stack. Outside those conditions, classic ML wins on cost, speed, and explainability.
Is XGBoost still the right default in 2026?
For tabular prediction problems, yes. XGBoost and LightGBM remain the practical default for most business prediction tasks — fraud detection, churn prediction, lead scoring, demand forecasting. They train in minutes, run in microseconds, and are interpretable enough to satisfy compliance reviews.
Sources
- arXiv (NeurIPS 2022) — Why do tree-based models still outperform deep learning on tabular data
- Microsoft Learn — Machine learning vs deep learning
- Stanford HAI — AI Index Report 2026
- Google Cloud — Deep learning vs machine learning vs AI
- IBM Think — AI vs. Machine Learning vs. Deep Learning vs. Neural Networks
- Google AI — Introduction to gradient boosting

Founder, Tech10
Doreid Haddad is the founder of Tech10. He has spent over a decade designing AI systems, marketing automation, and digital transformation strategies for global enterprise companies. His work focuses on building systems that actually work in production, not just in demos. Based in Rome.
Read more about Doreid


