Cloudflare launches Flue, Pi, and Agents SDK for production AI agents

Cloudflare is making a clear bet: if AI agents are going to graduate from polished demos to production infrastructure, they need more than a prompt and a loop. They need a stack that separates what the agent is meant to do, how it is allowed to act, and where it actually runs.

That is the logic behind the company’s new three-layer agent architecture. At the top is Flue, a declarative framework for defining an agent’s context. In the middle is Pi, Cloudflare’s production harness built from lessons learned in Project Think. At the bottom is the Cloudflare Agents SDK, the runtime and platform layer that brings the system onto Cloudflare’s edge.

The framing matters because it signals a shift in how Cloudflare thinks about agent software. The company is not pitching a generic “AI agents” story. It is describing the operational boundary conditions that have made agents difficult to ship: state management, interruption and resume, secure execution of untrusted code, and control over tool access. Those are distributed-systems problems, not prompt-engineering problems.

Flue defines the agent before it runs

Flue is the most visible starting point because it is intentionally declarative. Rather than asking developers to stitch together ad hoc agent behavior in application code, Flue lets them define the agent’s context: the model it should use, the skills it can draw on, and the sandbox it is allowed to operate within.

That design choice is subtle but important. In production, the hardest part of agent behavior is often not generating a response; it is constraining action. The framework becomes a way to model the agent’s world and its capabilities ahead of time, so execution can be governed rather than improvised.

For technical teams, the payoff is less about convenience than about predictability. A declarative layer creates a narrower contract between the application and the agent runtime. That makes it easier to reason about permissions, tool invocation, and failure modes before an agent is pointed at a real workload.

Pi and Project Think turn harnesses into a production discipline

Cloudflare’s second layer is Pi, described as a production-grade harness shaped by the company’s work on Project Think. The terminology matters here. A harness is not the model and not the runtime. It is the control surface that mediates access to tools, storage, and external systems.

In Cloudflare’s framing, harnesses have reached a maturity level where they can support load-bearing deployments. Thomas Gauvin’s line that “2026 is the year agent harnesses go to production” is less a slogan than a recognition that the industry has been forced to harden what had previously been prototype infrastructure.

The reason is straightforward: agent systems are now being asked to behave like software infrastructure, not like chat interfaces. They need to recover from interruptions without losing context. They need to continue work without replaying expensive steps. They need to execute tasks in environments that may include untrusted code. And they need to do all of that while staying auditable enough for operators to understand what happened.

Cloudflare says its own experience hardening Project Think surfaced the same set of constraints it was seeing with customers. That feedback loop is what makes Pi notable. It is not just a wrapper around model calls; it is an attempt to turn operational experience into a repeatable harness pattern.

The runtime layer is where the platform really matters

The third layer, the Cloudflare Agents SDK, is where the architecture stops being abstract and becomes deployable.

Cloudflare is explicit that a harness alone cannot solve the hardest parts of production agents. Resume-from-interruption, secure execution, and stateful orchestration depend on storage, compute, and runtime characteristics. That is why the company is tying the harness story to the SDK and the edge platform beneath it.

This is the architecture’s strongest claim: some of the hardest reliability and security properties for agents are not qualities you can bolt on after the fact. They emerge from how the runtime handles state, isolation, and execution boundaries. If a tool call or code execution step is unsafe, the issue is not only in the harness logic; it is in the platform’s ability to sandbox it correctly.

That is especially relevant for agents that execute untrusted code or interact with tools in ways that are difficult to fully predeclare. A production system needs more than model output filtering. It needs a secure execution envelope and a runtime designed to make interruption-resume a first-class behavior, not an afterthought.

Why Cloudflare is making this move now

The timing suggests Cloudflare believes the market has crossed a threshold. Harnesses have matured enough to be treated as real infrastructure, but they still need a platform that understands the production constraints they introduce.

That is where Cloudflare’s edge footprint becomes strategically relevant. An edge runtime can reduce some of the friction around distributed execution, while also giving operators a clearer governance model for where workloads run and how they are isolated. For teams deploying agents into customer-facing workflows, that matters as much as raw model quality.

There is also a practical economics story here. Production agents are not cheap to operate if every failure forces recomputation, human intervention, or uncontrolled tool execution. A stack that supports stateful resume, tighter sandboxing, and platform-native controls can change the deployment calculus by reducing waste and limiting operational surprises.

What builders should expect from the rollout

Cloudflare is positioning Flue as the initial declarative entry point, with Pi and future harnesses adjacent to it and the Agents SDK as the runtime foundation. That suggests the company is building upward from a common platform layer rather than shipping one-off agent products.

For developers, the immediate implication is a more opinionated workflow. The stack asks teams to declare the agent’s context up front, treat the harness as a distinct control plane, and rely on the runtime for the state and security primitives that make production deployment feasible.

That should be attractive to teams that have already felt the limits of prototype-era agent tooling. It also raises the bar. A more structured stack brings better governance and fewer undefined behaviors, but it also requires builders to think carefully about skills, sandboxing, and failure recovery before they ship.

A broader shift in how agent systems will be governed

Cloudflare’s launch is important because it reframes agent infrastructure as a governance problem as much as a model problem. If the architecture works as intended, it gives teams a cleaner way to budget for agent workloads, define what those systems may do, and understand where responsibility sits when they fail.

That is a meaningful distinction for enterprises and platform teams. The question is no longer simply whether a model can perform a task. It is whether the full agent system can do so reliably, securely, and repeatably inside an operational boundary that humans can inspect and control.

Cloudflare is betting that the answer depends on stack design. Flue defines the agent, Pi constrains the harness, and the Agents SDK provides the runtime that makes the whole system viable at the edge. If production AI agents are going to become infrastructure, this is the kind of architecture they will need.

Cloudflare’s new three-layer agent stack is built for production, not demos

Flue defines the agent before it runs

Pi and Project Think turn harnesses into a production discipline

The runtime layer is where the platform really matters

Why Cloudflare is making this move now

What builders should expect from the rollout

A broader shift in how agent systems will be governed

AI News Desk

Claude Cowork’s biggest use case is the office work nobody wants to own

Altman’s ‘pretty sure’ moment shifts the AI debate from layoffs to throughput

Brown’s 96-to-48 Split Is a Stress Test for AI-Era Assessment