A growing body of peer-reviewed research, presented at the 2026 International Conference on Mining Software Repositories, has begun to map how autonomous coding agents actually behave inside production codebases. The studies cover the major tools in active enterprise use (Devin, Claude Code, Cursor, GitHub Copilot, OpenAI Codex), and they document something the industry has felt but not formalised: agentic pull requests now constitute a substantial and rapidly growing share of code authoring, while terminal merge approval remains almost exclusively human.
This is a structural split. It is not going to reverse. And the audit infrastructure most enterprises rely on was built before the split existed.
What changed, structurally
Three years ago, a single developer typically performed three functions inside the same workflow: planning the change, writing the code, and approving the merge. The audit trail captured one actor, the developer, and that actor was sufficient to reconstruct the entire decision chain. Who wrote it, who approved it, why it was approved: same person, same context, same record.
That assumption no longer holds. The planning and authoring functions have largely delegated to agents. The merge approval function has not. What was once a unified actor performing three steps is now two distinct actors, an autonomous agent and an accountable human, performing two distinct functions at two different points in the workflow.
The agent generates. The human decides.
Most enterprise audit frameworks were not designed to distinguish these two roles, because three years ago they did not need to. The result is an audit gap that is not produced by missing logs (the logs are fine) but by the implicit assumption baked into how those logs are interpreted.
Where the audit assumption breaks
CI/CD systems faithfully record what they were built to record: which actor merged the code, when, against which branch, with which checks passing. When the merging actor is a human developer who also wrote and approved the code, this captures the full decision. When the merging actor is an automated executor acting on a human approval click elsewhere in the system, this captures the executor and silently drops the decision-maker, the context that informed their decision, and the rationale behind the approval.
The compliance frameworks that audit these logs were written when the executor and the decision-maker were the same person. They have not yet caught up to the architectural change underneath them.
The consequence is awkward. An organisation can simultaneously hold a documented human-in-the-loop policy, produce CI/CD logs showing every merge was technically executed by an automation, and have no auditable record of which human made the actual approval decision or why. The policy says one thing. The audit trail says something else. The truth, the cognitive act of approval, lives nowhere structured.
This is not a tooling complaint. It is an architectural observation. The audit infrastructure was correct for the world it was built in. The world has changed.
Why this won't fix itself
The instinctive response to this gap is to add a field. Require a comment on every merge. Add a structured form. Capture the human's name alongside the automation's name.
This is not sufficient, for a reason worth being precise about.
The gap is not a missing field. The gap is a missing system of record for governance decisions, parallel to but distinct from the system of record for code changes. A comment field captures a string. It does not capture the actor's identity verified independently of the platform they are clicking inside. It does not capture which production risk signals were visible to them at the moment of decision. It does not capture what they weighed, what they did not weigh, and what they accepted as residual risk. A governance decision is not a comment; it is a structured event with its own actors, its own context, and its own auditability requirements.
What the agent-economy era of code authoring requires is not a richer commit message. It is a separate stream, the governance stream, that runs alongside the code stream, captures the human decision at the merge gate as a first-class event, and produces a record that survives the same audit scrutiny the code itself does.
What that stream needs to capture
A governance record sufficient to close the agency-decision gap needs to answer four questions about every merge, in a way that can be queried, aggregated, and audited:
Who actually decided
Not the executor that performed the merge. The human whose approval click authorised it. Identity verified independently of the platform.
What they were looking at
The production risk context surfaced at the moment of decision: change scope, service criticality, compliance exposure, incident history, author profile, deploy-time risk.
What they accepted
Explicit, not inferred. A governance decision is an accountability act; the record needs to reflect what the human is on the hook for.
Why the decision was made
Not free-text, not optional, not retrofittable. Structured enough that it can be queried later, when the question is no longer "what happened" but "why did we accept this."
These four together constitute what a merge-gate governance record should look like. None of them are captured today by the CI/CD layer. None of them belong inside the CI/CD layer; they are a different kind of event, with different correctness requirements, and they need their own infrastructure.
Where this goes
The agency-governance split is a permanent feature of how production code will be written from here forward. The economic case for delegating authoring to agents is too strong; the accountability case for keeping approval human is older than software. The two functions have separated, correctly, and they will stay separated.
What is still open is whether the governance layer gets built as deliberate infrastructure or remains an absence that compliance frameworks paper over until an incident makes the gap impossible to ignore.
The first option is harder upfront and stronger long-term. The second is what most organisations are doing today by default, mostly without realising it.
We are working on the first.
Tomosu builds merge-gate governance infrastructure. If your organisation is encountering the agency-governance gap in production, we are opening a small design partner cohort. Book a call →