AI agents are expanding software engineering beyond code

The most important change in AI software this spring is not that agents are writing more code. It is that they are forcing engineering teams to treat prompts, workflows, policies, and decision routines as part of the product itself.

That is the argument advanced in a new paper from researchers at Chalmers University of Technology and Volvo Group, surfaced in April 2026 coverage as enterprise teams accelerated agent rollouts. Their core claim is deceptively simple: AI agents are not replacing software engineering. They are expanding it far beyond code.

For technical readers, the implication is more than semantic. It changes what teams need to design, validate, deploy, observe, and govern. It also helps explain why the current wave of AI product work is running into the same hard problems that have always defined enterprise software—only now those problems extend to artifacts that are partially executable, partly probabilistic, and often interpreted by humans at runtime.

Six rings, not a single codebase

The paper introduces a semi-executable stack with six rings, a framing that moves outward from the traditional codebase and into the broader socio-technical environment in which AI systems operate.

At the center sits conventional software code. But the stack does not stop there. It extends through layers that include prompts, workflows, policies, decision routines, organizational practices, and, at the outer edge, social and regulatory factors such as the EU AI Act.

The value of the model is that it treats these surrounding artifacts as engineering objects rather than informal glue. A prompt is not just a string. A workflow is not just process. A policy is not just documentation. In an AI agent system, each of these can shape behavior, constrain execution, and determine whether the product is safe enough to operate in production.

That matters because agents are not deterministic services in the way a conventional API call is deterministic. Their outputs depend on model behavior, context construction, orchestration logic, and runtime controls. Once you start shipping systems where those factors matter, the unit of engineering shifts. The question becomes not only whether the code is correct, but whether the surrounding semi-executable artifacts are valid, versioned, testable, and aligned with the intended operating regime.

From prototype to production: prompts and policies become deployment assets

This is where the research becomes operationally relevant for enterprise teams.

A lot of AI product development still starts in the familiar pattern: a prototype works in a notebook, a prompt chain gets assembled, a workflow is wrapped around it, and the result is pushed toward production with a mixture of optimism and monitoring. The paper suggests that approach is inadequate once agents begin to participate in real business processes.

If prompts and decision routines are now part of the software system, then they need the same discipline once reserved for source code and infrastructure. That includes version control, regression testing, environment parity, release management, and rollback procedures. It also means product and platform teams need clearer interfaces between model behavior and downstream systems so that semi-executable logic can be audited and updated without breaking everything around it.

In practice, this pushes AI tooling toward a more mature deployment stack. Teams will need prompt registries, workflow orchestration layers, policy engines, approval gates, and runtime observability that can explain not only what the model returned, but why the surrounding system allowed that output to influence a decision.

The technical implication is that AI rollout is increasingly a systems-integration problem. Enterprises are not just adopting a model. They are integrating a new class of semi-executable artifacts into CI/CD pipelines, incident response workflows, and product release processes.

Governance is not a bolt-on

The six-ring model also clarifies why governance keeps resurfacing in enterprise AI discussions.

When an AI agent participates in a decision path, the risk surface expands beyond code quality. Policy interpretation, human escalation, accountability assignment, and regulatory compliance all become part of runtime behavior. That creates a need for explicit oversight mechanisms, especially in environments where the cost of a wrong answer is high.

This is where human-in-the-loop design remains central, not as a temporary workaround but as a structural requirement. The paper’s framing implies that many enterprise systems will continue to require human interpretation at key points because semi-executable artifacts do not have the same guarantees as compiled software or hard-coded business rules. They can be powerful, but they are not self-justifying.

For regulated deployments, this has direct consequences. Teams will need to show how a prompt or policy was authored, who approved it, how it was tested, what fallback behavior exists, and how decisions are logged. In other words, the governance layer becomes part of the engineering architecture rather than an after-the-fact audit function.

The market shifts toward platforms that can manage the stack

The competitive takeaway is straightforward: vendors and platform teams that support only model access will lose ground to those that support the full semi-executable stack.

Enterprise buyers increasingly need tooling that can manage prompts, workflows, policies, and decision routines as durable assets with lifecycle controls. That includes observability for runtime behavior, guardrails for agent actions, integration with approval and compliance systems, and the ability to trace how a semi-executable artifact affected a live outcome.

This will favor platforms that can make AI behavior legible to engineers, operators, auditors, and business owners at the same time. Reliability and explainability are no longer downstream concerns; they are product features. And as AI agents move from demos into deployed enterprise workflows, the ability to govern these artifacts may become a competitive differentiator as important as model quality.

That shift also changes the economics of adoption. A team that can operationalize the semi-executable stack is better positioned to scale agent use cases across departments. A team that cannot will likely remain stuck in pilot purgatory, with impressive demos that never clear the threshold for enterprise trust.

What teams should do in the next 90 days

For organizations preparing for broader agent deployment, the next step is not to chase more automation in the abstract. It is to inventory the artifacts that already govern behavior.

Start by mapping prompts, workflows, policies, and decision routines across current AI initiatives. Identify which of these artifacts are versioned, tested, reviewed, and monitored—and which exist only in shared documents, notebooks, or ad hoc scripts.

Then add runtime observability that can trace agent actions to the surrounding control logic. If an agent makes a bad call, teams should be able to tell whether the failure came from the model, the prompt, the orchestration layer, or a policy gap.

Finally, map the six-ring model onto product and platform roadmaps. That means asking where code ends and semi-executable behavior begins, which rings are under engineering control, and which require cross-functional governance. The goal is not to freeze innovation. It is to make sure the organization can scale it without losing control of how the system behaves.

The broader lesson from the research is that AI agents are not shrinking software engineering. They are enlarging its scope. In enterprise environments, the winners will be the teams that recognize that early and build for the full stack, not just the code at the center.