When the Agent Has an OS and Production Has No Gate, the Gate Is the Product

On June 1, Microsoft and NVIDIA shipped a source envelope for agents. RTX Spark with the OpenShell runtime gives Windows a native way to govern what agents on the device can do, where their queries route, and how personal information leaves the machine. AWS Bedrock AgentCore Policy shipped the cloud version of the same idea in March. Apple Intelligence will ship the macOS version. Gemini Nano will ship the Android version. Within twelve months, every major substrate will have its own envelope for the agents that run on it. This is real progress. For the substrate each vendor owns, it is the correct architectural pattern. It also leaves the harder half of the problem completely unsolved.

The half that didn't get solved

OpenShell governs an agent inside the Windows machine it was launched on. It does not govern that agent's behavior the moment it crosses a boundary. The MCP server hosted on the open internet that the agent calls. The Lambda function it spawned to do heavier work. The Cursor backend that wrote half the diff before Spark ever saw it. The GitHub Actions runner that completed the agent's commit pipeline. The Anthropic-hosted sandbox running a parallel task. Each of these substrates has its own envelope, or no envelope at all. None of them propagate provenance across to the next hop.

By the time the agent has called three MCP servers, run on two compute substrates, and produced a pull request that lands in your CI pipeline, the original OpenShell context is a memory. The substrate vendors are not going to fix this. They cannot. Every substrate has an economic incentive to keep the agent inside its own envelope rather than hand it off cleanly to a competitor's.

An agent's journey across five substrates. OpenShell governs hop zero. Nothing governs hops one through four. By the time the artifact reaches your merge gate, every substrate's context has been discarded. The merge gate is the first place where governance can be universal.

What every substrate does share, however, is the destination. The agent can run anywhere. The artifact has to land somewhere.

The only universal layer

Every agent's work, regardless of which substrate produced it, eventually deposits its output into a place the enterprise owns and controls. A pull request in Git. A model card in a registry. An infrastructure manifest. A change-management ticket. A deployment plan. This is the convergence point of every substrate, and it is the only layer where every agent's output, no matter how heterogeneous the origin, can be evaluated against the same standard before it changes production reality.

Four substrate vendors, four governance envelopes, one convergence point. The merge gate is the only layer every substrate's output must cross before it changes production reality. It is the only place where governance can be applied universally, regardless of origin.

The agent has an OS now. Production still does not have a gate.

What Tomosu does

Tomosu is the gate. It sits above your existing GitHub or GitLab and scores every pull request, especially AI-generated code, before it lands on main. The score is the Production Reliability Index, a single trendable number composed of signals you can name to a board.

Change Scope

The blast radius of the diff. What services, what data paths, what downstream systems this change can affect.

Service Criticality

What it costs if the service this change touches goes down. The tier of the system the diff is reaching into.

Compliance Exposure

Whether the change touches regulated paths. PCI on the payments side. HIPAA on the patient-data side. SOC 2 on the audit-bearing side.

AI-Origin Signal

Whether the change was AI-generated, and to what degree. Which substrate produced it. What the provenance trail looks like.

These are four of the seven signals that compose the PRI. The full set rolls up into one number a board can track across quarters. The point is not the signal count. The point is that the score is computed at the moment the merge decision is made, against the system the change is about to enter, with the receipts attached.

The Production Reliability Index scores every PR at the moment of the merge decision. Four signals shown here; seven in the full model. The human still decides. The basis for that decision is now on the record alongside the score.

Read-only across Git, observability, and ticketing. Live in days. The PR queue keeps moving. Low-risk merges sail through untouched. High-risk merges surface for a deliberate human call. Every governance decision is logged with evidence, audit-ready by default.

Why this layer is structural, not optional

Watch the Microsoft Build keynote running June 2 and June 3. Expect demos of agents writing code, porting Windows apps to Arm, automating workflows. Expect the word governance used many times. Expect security framing around OpenShell.

Watch carefully for what will be absent. There will be no scoring of whether the agent's diff is safe to merge to production. There will be no production risk index attached to the artifact. There will be no mechanism by which the merge gate at your own GitHub or GitLab tenant verifies that the change crossing it, regardless of which substrate produced it, meets a risk threshold to ship. That layer is missing from the keynote because it is not Redmond's to give and it is not Santa Clara's to give. It is your boundary, between the agent's output and your production reality.

Source governance is provincial. Each envelope is local to the substrate that owns it and dies at the first hop. Production governance is universal. It lives at the boundary between the agent economy and your production reality, and it is the only layer every substrate's output must cross.

This is the asymmetry the next decade of agentic governance will be built around. Source governance is provincial. It is local to its substrate. It dies at the first hop. Production governance is universal. It is local to your enterprise boundary, and it is the only place where every substrate's output can be evaluated against the same standard.

The substrate vendors will not build this for you. They are correctly focused on the substrate they own. The cross-substrate layer is your own, and it is the layer that determines whether a green CI check actually means the change is safe to ship.

What a 90-day pilot looks like

We are partnering with three to five engineering organizations in regulated industries over the next 90 days. The pilot has measurable milestones, not vibes.

The 90-day pilot is structured around four measurable milestones. Week one: baseline established. Day 30: every PR has a score and a reason. Day 60: closed loop with tuned policy. Day 90: a board-ready PRI trendline the team owns.

If you are watching your engineers reviewing fifteen pull requests in the time they used to review three, if you are watching a CI green check come unmoored from whether the change is actually safe to ship, if you have started to feel the asymmetry between how fast code now arrives and how fast you can verify it is safe, Tomosu was built for this moment.

The substrate vendors are shipping the envelope. The artifact still needs the gate.

Book a 30-minute conversation: calendly.com/manil-tomosu/30min

Questions: contact@tomosu.ai · Book a call →