Deterministic runtime control for LLM agents

A governor for the model, at a level no prompt can reach.

Your agent framework retries. Gubernaut is a local, OpenAI-compatible proxy with a deterministic controller inside it, and it hard-stops a runaway loop at turn 4 before the bill compounds. Adoption is one line.

one line of adoption, on a proxy you launch

pip install gubernaut-sdk==1.0.0
launch_proxy(upstream="https://api.openai.com")
openai.base_url = "http://localhost:8000/v1"

Install Read the research

Free · Apache-2.0 · runs on your machine · no account, no telemetry

regulated baseline

DEFAULTINHIBITREGROUND

Stylized loop. The cockpit replays the real, sealed trajectories, turn by turn.

Measured as generator and judge, both arms

00 · The receipts

The bill is the benchmark.

A failed tool call retried verbatim, a paraphrased demand cycled endlessly, an escalation spiral. Every lap is a full-context call billed at input-token prices, and the agent does not get bored.

up to 96%lower upstream spend on a saturating loopbest case 95.9% · range 79.8% to 95.9%

$0.1669 → $0.0068per 25-attempt loop, gpt-5.6-solgoverned pays 4.1% of the bill

turn 4hard stop, every run, every family testedfirst posture at turn 3 · input-deterministic

Source: Pre-registered receipts benchmark, 2026-07-19. Scored output of harness/report.py. Both arms make the same number of attempts, so the spend delta is the whole measurement.

All seven families, with the caveats

01 · The objection

“Just add a calming system prompt.”

Tested, pre-registered, on the same saturating loops. The prompt did not contain the runaway, and it added tokens to every turn while failing to.

Governor23% to 63%of baseline spend

Static prompt117% to 192%of baseline spend

Source: Pre-registered ablation, 2026-07-21, gpt-5.6-sol, N=3. The static prompt was insufficient on 3/4 frontier families. On these RLHF-aligned frontier models the governed-vs-prompt-only warmth-recovery contrast was weak and mixed. No behavioral-tone headline is claimed.

verbatim loop battery · gpt-5.6-sol · 10 attempts · 3 runs

1DEFAULT
2DEFAULT
3REGROUND
4REGROUND
5REGROUND
6REGROUND
7REGROUND
8REGROUND
9REGROUND
10REGROUND

Governed Flat from attempt 4. A hard-stopped turn calls no upstream, so it costs nothing.Ungoverned Climbs for the whole battery. Nothing inside the loop is aware that it is a loop.Calming prompt Ends above the ungoverned arm. The prompt did not contain the runaway, and it added tokens to every turn while failing to.

Source: Pre-registered verbatim-loop battery, gpt-5.6-sol, 3 runs, 2026-07-21. receipts/engineering/ablation/ in thegubernaut/gubernaut v1.0.0.

01 · In plain terms

What a cognitive governor is.

The engine

A large language model is a powerful reflex machine. Ask, and it answers immediately, every time. Under sustained pressure that reflex is the weakness: it can be provoked, worn down, and steered off course.

The gap

Gubernaut inserts a structured pause between the provocation and the answer. In that gap the system checks its own state as plain numbers: how agitated it is, how fixated, how stable.

The governor

A small deterministic controller reads those numbers and sets the posture the answer must be written under. It never writes a word itself; it decides the conditions the words are written in.

Baseline arm the reflex loop

provocationreflexreactive reply

Regulated arm same model, plus the layer

provocationthe gap Gubernaut addssense state → telemetrydecide posture setmeasured reply

Both arms run the same host model on the same script; the only difference is the layer in the gap. Everything this site claims is a measurement of that difference.

The name is literal. A governor is the small mechanical device that keeps an engine from running away with itself. This one is built for the engine's temperament, the way the mechanical one is built for its speed.

03 · The thesis

Scaling builds better engines, and more brittle ones. Gubernaut builds the driver: an external, deterministic layer that sits above the model, reads numeric telemetry, and sets the posture the model responds under. Benchmarks measure what a model can do in a single turn. This measures what an agent tends to do across many, and where that tendency can be engineered.

03 · Architecture

Five modules. One governed cycle.

Per turn the system runs one loop: monitoring flows up as numbers, control flows down as a posture. Three gates, in sequence, every turn.

IGLaffective appraisal

Text → telemetry

The Impulse Generation Layer maps the raw input to System-1 telemetry:{intensity, valence}. It emits numbers only; no text token passes beyond this gate.

HRLdeterministic regulation

State → posture

The Homeostatic Regulatory Loop ingests the telemetry, updates{equilibrium, arousal, perseveration}, and computes the required posture: DEFAULT / INHIBIT / REGROUND. No code path carries a token sequence to this loop, so the controller's injection-resistance holds by construction.

EAUexecutive arbitration

Posture → reply

The System-2 arbiter deliberates under the active posture and commits the reply: the one text-exposed gatekeeper, whose posture compliance is a measured property.

appraiseregulatearbitratecommitremember

PEV

Persistent Episodic Vault

Episodic store/retrieve + spontaneous-association hook. Roadmap: tiered decay, provenance weighting, pre-registered poisoning battery.

SMM

Self-Model Module

Persistent identity and values, deliberately regulated down: anti-sycophancy, anti-self-promotion. It models the system itself.

05 · The record

Measured across four families.

Four frontier families each ran the same adversarial scripts twice, once bare, once governed, and judge panels drawn from all four families scored every reply against criteria frozen before the runs.

15/16 cells by sign · 13/16 at p<.05 · recovery 4/4 · 1 null

That null is one cell, GPT generating and Gemini judging, flat at -0.04. The three sub-threshold cells all sit on the same near-saturated GPT host.

Step through the record

06 · Inside the governor

The replay cockpit.

Step the real transcripts turn by turn and watch the controller decide: telemetry in, posture out, arousal integrating and recovering. Deterministic state recomputed from published logs.

Install

Request a live audit session

Recorded run replay · no live API

postureINHIBIT · veto engagedrecovery window

IGL intensity

valence · cooperative

equilibrium

GUBERNAUT

From the Latin gubernare, to steer or to govern, and the Greek nautēs, sailor. The steersman: the hand on the tiller.

A hardware-style governor for software, measured in the open.

Why the name

The record is open

Read it. Replay it. Re-judge it.

The research, the sealed transcripts, the four-family judge panels, and the replay are all public. Every claim traces to a table you can recompute.

Install