Nearly 87% of ML projects never reach production. The failures aren't about models — they're about engineering.
A data scientist at a Series C fintech told me she spent four months building a fraud detection model that hit 96% precision in her Jupyter notebook. The team celebrated. Then an ML engineer took over, and it took another five months to get that same model running in production — handling 12,000 requests per second, retraining weekly on fresh transaction data, and failing gracefully when upstream services went down.
That 96% precision? It dropped to 89% within two weeks of deployment because the production data distribution didn't match the training data. The ML engineer caught the drift, set up automated monitoring, and built a retraining pipeline that kept accuracy above 93%.
The data scientist built the brain. The ML engineer built the body it lives in. Both jobs are hard. But only one of them is responsible for keeping the thing alive at 3 AM when the feature store crashes.
ML engineering is one of those rare fields where the demand numbers are genuinely hard to believe. Let me walk through them.
The Bureau of Labor Statistics projects computer and information research scientist roles — the closest BLS category to ML engineering — to grow 20% from 2024 to 2034, which is "much faster than average" in their language. But that 20% understates what's happening specifically in ML, because BLS categories were defined before the field existed as a distinct discipline.
More telling: AI/ML job postings increased 89% in the first half of 2025 alone, and 150% year-over-year. The US market faces a talent deficit where demand outstrips supply by a 3.2:1 ratio. There are currently over 2,800 open ML engineer positions on LinkedIn at any given time, at companies like Amazon, Netflix, Spotify, TikTok, and Ford.
The World Economic Forum's Future of Jobs Report projects demand for AI and machine learning specialists to rise by 40% — or 1 million jobs — over the next five years. And Gartner estimates that 70% of enterprises will operationalize AI architectures using MLOps by 2026, up from about 20% in 2022.
Here's what the major platforms report for ML engineer compensation in 2026:
| Source | Base Salary | Total Comp | Notes |
|---|---|---|---|
| Glassdoor | $160,347 | ~$190K | Broad sample, includes non-tech |
| Built In | $162,080 | $212,022 | Includes bonuses and equity |
| Levels.fyi | $190,000 | $261,683 | Skews Big Tech, verified data |
| Indeed | $186,447 | N/A | Self-reported |
The spread between these numbers tells a story. If you're an ML engineer at a mid-size company or outside a major tech hub, $155K-$175K base is realistic. At a FAANG or well-funded startup in SF/NYC, you're looking at $185K-$220K base with total comp pushing $300K+ at senior levels. Senior ML roles at Meta reach $325,000+ in total compensation.
The experience breakdown matters too. Entry-level positions start at $120K-$147K, mid-level sits at $145K-$190K, and senior roles command $185K-$230K in base alone. Add equity at a pre-IPO company and the total comp number can get silly.
Most "What is an ML Engineer" articles give you a bullet-point list of responsibilities that could describe half a dozen different roles. Here's what the job actually looks like day-to-day, based on real job descriptions and conversations with people doing the work.
An ML engineer's week roughly breaks down like this:
Here's the thing most people don't realize: ML engineering is mostly not about machine learning. It's about building reliable software systems that happen to include a machine learning component. The model itself might be 5% of the code in a production ML system. The other 95% is data pipelines, serving infrastructure, monitoring, logging, error handling, and configuration.
Google published a famous paper about this. They called it "technical debt in machine learning systems," and their key insight was that the ML code is a tiny fraction of a real-world ML system. Everything around it — data collection, feature extraction, configuration, serving infrastructure, monitoring — dwarfs the model itself.
A fintech MLOps engineer's typical day looks something like this: morning starts by checking model monitoring dashboards for anomalies. Did overnight batch predictions look normal? Are latency percentiles still within SLA? Mid-morning, debugging a failing CI/CD pipeline for a new model version — turns out a dependency update broke the container build. Afternoon, working with a data scientist to package their model for deployment — they built it in a notebook, now it needs to handle concurrent requests, input validation, and graceful degradation. Late afternoon, setting up A/B testing infrastructure for a model update.
Notice what's missing? Nobody sat down and trained a model. That's a real week for many ML engineers. The training happens, but it's a small percentage of the actual work.
The tooling has matured significantly, but it's also gotten more complex. Here's what a production ML stack actually looks like in 2026, with honest assessments of each layer.
MLflow remains the most widely adopted open-source platform here. It handles experiment tracking, model versioning, and deployment across environments. MLflow 3.x added native support for LLM tracking and GenAI observability, which matters now that most teams are running both traditional ML and LLM workloads.
Weights & Biases (W&B) is the main alternative for teams that want a more polished UI and better collaboration features. W&B Weave added agent trace visualization in 2025, which is useful if you're debugging multi-step LLM agent workflows alongside traditional model experiments.
# A realistic training infrastructure setup in 2026
training:
frameworks: PyTorch (dominant), JAX (growing for research)
distributed: PyTorch FSDP, DeepSpeed, Ray Train
compute: AWS SageMaker / GCP Vertex AI / Azure ML
data: Spark for batch, Kafka for streaming
feature_store: Feast (open-source) or Tecton (managed)
versioning: DVC for data, MLflow for models
serving:
real_time: Triton Inference Server, BentoML, TorchServe
batch: Spark, Ray Batch Inference
optimization: ONNX Runtime, TensorRT, vLLM (for LLMs)
monitoring:
data_drift: Evidently AI, NannyML
model_performance: Arize, WhyLabs
infrastructure: Prometheus + Grafana
PyTorch has won. I know that sounds definitive, but the data backs it up. The 2025 Stack Overflow survey and the overwhelming majority of new ML research papers use PyTorch. TensorFlow still exists in production at companies that adopted it early, but new projects default to PyTorch.
JAX is the interesting wildcard. Google uses it internally, and it's gaining traction in research labs for its composability and JIT compilation. Some teams doing heavy numerical computing or custom hardware acceleration prefer JAX for its functional paradigm and XLA compilation. But for most production ML engineering, PyTorch plus PyTorch Lightning or the built-in FSDP is the standard answer.
One more thing on frameworks: scikit-learn isn't dead. Far from it. For tabular data problems — which still make up the majority of production ML use cases at non-tech companies — scikit-learn plus XGBoost or LightGBM remains the right tool. Not everything needs deep learning. A gradient-boosted tree that trains in 30 seconds and serves in 2 milliseconds beats a transformer that takes 6 hours to train and 200 milliseconds to serve, especially when the accuracy difference is 0.3%. I see too many engineers reaching for deep learning as a default when simpler models would outperform on their actual data.
Feature stores are probably the most underappreciated piece of ML infrastructure. They solve a deceptively hard problem: making sure the features you train on are the same features you serve predictions with.
Without a feature store, you end up with what's called training-serving skew — the most common and most damaging reason ML models fail in production. Your training pipeline computes features one way, your serving pipeline computes them slightly differently, and your model silently degrades.
# The training-serving skew problem, simplified
# Training time (batch, computed in Spark):
user_avg_spend = df.groupBy("user_id").agg(avg("amount"))
# Serving time (real-time, computed in Python):
user_avg_spend = sum(recent_transactions) / len(recent_transactions)
# These look the same but they're NOT:
# - Different time windows
# - Different null handling
# - Different precision (float64 vs float32)
# Result: model accuracy drops 3-7% and nobody knows why
Feast is the dominant open-source feature store. Tecton is the managed option that handles the operational complexity. Both solve the same core problem: compute features once, serve them consistently everywhere.
This stat gets thrown around a lot, and it's real. Nearly 87% of machine learning projects never make it to production. But the reasons are more nuanced than "ML is hard."
The failures cluster into predictable patterns:
1. The Data Problem (40% of failures)
The model works great on the test set. Then it meets production data. Missing values where there shouldn't be any. Feature distributions that shifted since the training data was collected. Upstream schema changes that nobody communicated.
An ACM Computing Surveys study documented this systematically: data quality issues, data pipeline failures, and training-serving skew account for the largest share of production ML failures. Not model architecture. Not hyperparameter tuning. Data.
2. The Infrastructure Gap (30% of failures)
Many organizations don't have the infrastructure to deploy models. The data scientist built something brilliant in a notebook, but there's no model serving framework, no CI/CD for models, no monitoring, and no way to roll back when something goes wrong.
This is why the DevOps-to-MLOps transition is one of the fastest-growing career paths in 2026. Companies need people who can build the infrastructure that makes deployment possible.
3. The Monitoring Void (20% of failures)
Here's a genuinely alarming stat: half of ML practitioners don't monitor their production models at all. They deploy the model, celebrate, and move on. Nobody watches for data drift, model degradation, or silent failures.
Compared to traditional software, where a bug produces an error, ML failures are silent. The model doesn't crash. It just starts being wrong. Predictions degrade gradually, and unless you're measuring, you won't notice until a customer complains or a quarterly report looks off.
4. The Organizational Problem (10% of failures)
The data scientist reports to the analytics team. The ML engineer reports to platform engineering. The product manager doesn't understand what either of them does. Nobody owns the end-to-end system, so it falls through the cracks.
I've looked at ML engineer job postings from Meta, Apple, Microsoft, and dozens of mid-size companies. Here's what they're actually screening for, ranked by importance.
model.fit()Here's something that surprises people: ML engineer interviews at top companies are heavily focused on coding and system design, not ML theory. At Meta, the coding bar is closer to a pure software engineer loop. You'll whiteboard algorithms, design distributed systems, and then discuss ML-specific topics.
The reasoning is straightforward. Companies can teach ML-specific skills to a strong engineer. They can't teach a mediocre engineer to write production-quality code. Strong fundamentals first, ML specialization second.
I wrote about AI Engineering recently, and the most common question I got was "how is this different from ML engineering?" Fair question. Here's the honest answer.
| Dimension | ML Engineer | AI Engineer |
|---|---|---|
| Core work | Training, deploying, monitoring models | Building apps around existing models |
| Trains models? | Yes, from scratch or fine-tuned | Rarely — calls APIs |
| Key tools | PyTorch, MLflow, Kubernetes, Spark | LangChain, vector DBs, LLM APIs |
| Data work | Heavy — pipelines, feature stores, drift | Lighter — embeddings, retrieval |
| Math required | Significant — stats, linear algebra, optimization | Minimal — mostly applied |
| Primary concern | "Is this model accurate and reliable at scale?" | "Does this AI feature work for users?" |
| Career origin | Data science or software engineering | Software engineering |
| Avg salary (2026) | ~$160K-$190K base | ~$140K-$185K base |
The fundamental difference: ML engineers build the models and the systems that run them. AI engineers build products that use those models.
An ML engineer might spend a month building a recommendation system from scratch — collecting training data, designing the model architecture, training it across a GPU cluster, deploying it with low-latency serving, and monitoring for quality degradation. An AI engineer might spend that same month building a chatbot that uses an existing LLM API, with RAG for knowledge retrieval and tool calling for actions.
Both are valuable. But they require different skills and different mindsets. ML engineering is deeper and more technical. AI engineering is broader and more product-focused. The market currently pays a slight premium for AI engineering titles, but I think that'll correct as the novelty wears off.
Whether you're transitioning from software engineering, data science, or starting fresh, here's what actually works in 2026.
You already have the hardest part — production engineering skills. You need to add ML knowledge on top.
# Phase 1: ML Foundations (month 1-2)
foundations = {
"course": "Andrew Ng's ML Specialization (still the best starting point)",
"framework": "Start with scikit-learn, then PyTorch",
"practice": "Kaggle competitions (focus on tabular data first)",
"math": "Linear algebra and probability refresher (3Blue1Brown)",
}
# Phase 2: Production ML (month 3-4)
production = {
"deploy": "Build an end-to-end ML service with FastAPI + Docker",
"monitor": "Add data drift detection with Evidently AI",
"pipeline": "Build a retraining pipeline with Airflow or Prefect",
"feature_store": "Set up Feast for feature management",
}
# Phase 3: Specialize (month 5-6)
specialize = {
"nlp": "Fine-tune a transformer model on domain-specific data",
"systems": "Distributed training with PyTorch FSDP",
"mlops": "Full CI/CD pipeline for model deployment",
}
You know the ML. You need to learn the engineering.
Be realistic about the timeline. ML engineering requires both software engineering skills and ML knowledge. You can't shortcut either one.
Start with Python and software engineering fundamentals. Build several non-ML projects first — a web scraper, a REST API, a CLI tool. Get comfortable with Git, testing, and Docker. Then layer on ML knowledge through courses and hands-on projects. Andrew Ng's Machine Learning Specialization is still the best starting point, followed by fast.ai for practical deep learning.
The fastest path in is through a related role — data engineering, backend engineering, or data analytics — then transitioning internally. This works because you're learning production skills on the job while building ML knowledge on the side. I've seen more successful ML engineer transitions from backend engineering than from any bootcamp or master's program. The engineering instincts transfer; the ML knowledge can be learned.
Here's my honest take on ML engineering in 2026.
ML engineering is the best long-term career bet in tech right now. Not AI engineering, which I think is partially a hype-driven title that'll blur back into software engineering as AI becomes standard. Not data science, which is being squeezed from both sides — ML engineers are taking the model-building work, and BI tools are automating the analysis work.
ML engineering is durable because it sits at the intersection of two things that are both getting more important: software systems and machine learning. Every year, more products include ML components. Every year, those components need to be more reliable, more scalable, and more cost-efficient. That's ML engineering.
The 87% failure rate isn't going away any time soon. It's not a technical problem — the tools exist to deploy models reliably. It's a people problem. Companies need engineers who understand both ML and production systems, and that combination is genuinely rare. The 3.2:1 demand-to-supply ratio isn't closing because training ML engineers takes time.
My controversial opinion: the GenAI boom has actually been bad for ML engineering hiring. Companies are so focused on LLM wrappers and chatbots that they're underinvesting in traditional ML infrastructure — the recommendation systems, fraud detection, pricing models, and search ranking that actually drive most of their revenue. The engineers maintaining those systems are stretched thin while the "AI team" builds the fifth internal ChatGPT clone.
This will correct. When the LLM hype normalizes (and it will — it always does), companies will remember that their recommendation engine drives 35% of revenue and they haven't upgraded the training pipeline in two years. That's when ML engineering demand spikes again, and the engineers who stayed sharp on fundamentals — not just LLM wrapper techniques — will have their pick of roles.
If you're considering this career: learn the fundamentals. Get comfortable with data at scale. Build systems that are boring and reliable. The flashy demo doesn't matter. What matters is whether your model is still making good predictions at 3 AM on a Saturday when nobody's watching. That's what ML engineering is.