AI spending is rising faster than the savings most companies can show for it.

That is the central tension in a new Bain & Company survey of 951 companies: nearly 40% say they are capturing less than 10% in AI cost savings, even though the most common target sits in the 11% to 20% range. Only 14% say they have cleared 21% savings, while 43% have reached at least 10%. At the same time, 9 in 10 companies plan to increase AI investment anyway, especially in AI agents.

For technical teams, the headline is not that AI is failing. It is that the savings curve is being flattened by deployment choices. The gap between expected and realized ROI is not primarily a model capability problem. It is a systems problem: who can access what data, where humans stay in the loop, and how much of the workflow has actually been reengineered rather than merely decorated with AI.

The savings paradox: budgets rise, returns stall

The Bain numbers map a familiar enterprise pattern. Procurement signs off on pilots assuming a certain degree of automation. Engineering lands a tool that looks viable in a demo. Operations then discovers that the surrounding workflow still requires approvals, manual handoffs, and repeated data retrieval from systems that were never designed for machine consumption.

That mismatch matters because the target savings bands are not trivial. If a company budgets for 11% to 20% cost reduction and ends up below 10%, the deployment is not just underperforming slightly; it is failing to clear the threshold that justified the program in the first place. The fact that 14% of companies did exceed 21% savings shows the upside is real. But it also reinforces that the spread is being driven by implementation discipline, not by access to the same model family.

The most important strategic signal in the Bain survey is that companies are not backing away. Nine in 10 plan to increase AI investment, suggesting they believe the issue is not whether to automate, but how to automate without reintroducing labor in the loop so often that the economics collapse.

What human-in-the-loop actually costs you

Bain’s data points to a direct explanation for why AI budgets are not converting into proportional savings: humans are still sitting in the critical path.

Only 7% of companies say they run fully autonomous agents. Another 32% involve humans only when needed, while the most common setup, at 38%, still requires human approval. That architecture may look conservative from a risk standpoint, but it carries a hidden cost structure. Every approval step adds latency, every exception path adds operational overhead, and every handoff introduces failure modes around context loss, queueing, and inconsistent decision quality.

In other words, a system that is marketed as “AI-powered” can still behave like a traditional workflow with a smart suggestion layer attached.

The biggest blocker is not abstract resistance to automation. It is data access, cited by 41% of respondents as a major hurdle. That is a more technical diagnosis than many enterprise AI discussions usually admit. If the model cannot reach the systems of record cleanly, then agents cannot take action autonomously. If permissions are fragmented across domains, every useful step becomes a request for human mediation. And if data quality is poor or siloed, the model is forced into a read-only role where it can summarize, classify, or draft — but not complete the work.

That is why the human-in-the-loop issue is not just about governance philosophy. It is about where the architecture stops. A human approval requirement often signals that the workflow has not been redesigned for machine execution. The result is a partial automation stack that preserves the old process, then layers AI on top of it.

Where the gains are real—and how they happened

The companies that did better appear to have made a more consequential change: they narrowed the problem, tightened the workflow, and improved the underlying data path.

Bain’s survey shows 43% of firms achieved at least 10% savings, and 14% exceeded 21%. Those results imply that meaningful ROI is possible when AI is deployed into bounded workflows where the system can reliably do more than recommend.

The common thread is likely not magic model performance. It is operational scope. More successful deployments tend to work when the task is repeatable, the decision logic is structured, and the data pipeline is relatively clean. In those conditions, automation can actually remove labor rather than redistribute it into review queues.

That distinction is especially important for teams evaluating agents. An agent is only economically compelling if it can complete a meaningful portion of the workflow end to end. If it merely drafts a response that an employee must verify, correct, and submit, the savings profile is much thinner than the sales pitch suggests.

A technical playbook to unlock ROI

Bain’s recommendation — treat AI rollout as a management issue, not an IT task — is easy to read as generic consulting advice. But the technical implication is sharper: the unit of deployment should be the process, not the model.

That shifts the implementation playbook in several concrete ways:

  1. Redesign the workflow before you automate it. Map every human approval, exception route, and system handoff in the target process. If a step exists only to compensate for legacy fragmentation, it is a candidate for elimination or redesign, not just digitization.
  1. Treat data access as a product. The 41% bottleneck number is a warning that AI programs stall when access is handled as an ad hoc integration problem. Build controlled service layers, permissioning, audit logs, and retrieval paths that let agents operate against governed data without opening broad access to raw systems.
  1. Use autonomy where the control plane can support it. Only 7% of companies are fully autonomous today, but that figure should be read as evidence of immaturity, not a permanent constraint. Autonomous agents make sense where data quality, policy constraints, and action boundaries are well understood. Everywhere else, bounded human review may still be necessary — but it should be a deliberate design choice, not the default failure mode.
  1. Measure operational outcomes, not usage. AI dashboards that count prompts, sessions, or model calls can obscure the actual economics. Tie the rollout to cycle time, cost per transaction, error rate, queue length, and labor hours removed from the workflow. If those metrics do not move, the deployment is not yet producing savings.
  1. Instrument exceptions as first-class signals. Human approvals are often a symptom of unresolved data or policy gaps. Logging why a request escalated, where the model lacked confidence, and which systems blocked execution gives engineering teams a roadmap for deeper automation.

This is the point where AI deployment becomes less about experimentation and more about process engineering. The firms that realize savings are not simply adding copilots. They are constraining the problem until software can reliably own a larger slice of the work.

Market implications for product and deployment strategy

The Bain survey also hints at where the market is going next. If most companies plan to increase AI investment despite lagging savings, vendor demand will likely tilt toward products that make autonomy more practical rather than merely more impressive in demos.

That means stronger orchestration layers, cleaner end-to-end data pipelines, policy-aware controls, and governance features that can scale with real operational risk. It also suggests that the next phase of enterprise AI competition will be less about raw model access and more about who can collapse the distance between data, decision, and execution.

For product teams, that is a meaningful shift. The winning stack is unlikely to be the one that produces the best-looking output in isolation. It will be the one that can act on enterprise data safely, route exceptions intelligently, and reduce the number of humans needed to complete a workflow.

The Bain survey does not argue that AI cannot save money. It shows something more useful: savings are being lost in the seams between systems, policies, and org charts. For engineering teams, that is good news and bad news. The bad news is that model selection alone will not fix the ROI problem. The good news is that the largest gains may be available not from waiting on a smarter model, but from redesigning the deployment so the model can finally do the work it was supposedly hired to do.