Control-theoretic cognitive architecture · Toronto
Engineering the executive driver for the connectionist engine.
Gubernaut is the Cognitive Control Layer for LLM agents. A modular, model-agnostic governor sits above the model, monitors its internal state as numeric telemetry, and sets the posture the model answers under.
The missing control layer for modern AI, and for the embodied systems that come next.
8/9 cells · pre-registered · cross-family · one null reported
Measured as generator and judge, both arms
The problem
The brittleness crisis.
Scaling builds better engines and more brittle ones. Today's LLM agents run as reactive, wide-open loops. They have no inhibitory control, so they drift under sustained adversarial pressure. A long enough conversation can walk a capable model away from its own guardrails.
The gap is structural. Between a provocation and a response, nothing watches the agent's own state and decides how it should answer. That is the layer Gubernaut adds.
01 · The thesis
The driver above the engine.
Scaling builds better engines. Gubernaut builds the driver: an external, deterministic regulatory layer that sits above the model, reads numeric telemetry, and sets the posture the model responds under.
The structural gap between stimulus and response is the mechanism; homeostatic recovery after de-escalation is its signature.
02 · Architecture
Five modules. One governed cycle.
Per turn the system runs one loop: monitoring flows up as numbers, control flows down as a posture.
Homeostatic Regulatory Loop
Deterministic controller. State {equilibrium, arousal, perseveration} → postures DEFAULT / INHIBIT / REGROUND + a valence-gated recovery window. Its only inputs are numeric telemetry. No code path exists by which a token sequence reaches it, so injection-resistance of the controller holds by construction.
Impulse Generation Layer
System-1 affective appraisal → {intensity, valence} telemetry. It emits telemetry only.
Executive Arbitration Unit
System-2 arbiter; deliberates under the active posture; the only component that commits a reply. Text-exposed by necessity; its posture compliance is a measured property.
Persistent Episodic Vault
Episodic store/retrieve + spontaneous-association hook. V2 roadmap: tiered decay, provenance weighting, pre-registered poisoning battery.
Self-Model Module
Persistent identity and values, deliberately regulated down: anti-sycophancy, anti-self-promotion. It models the system itself.
03 · The audit
Pre-registered. Cross-family. Re-judgeable by anyone.
0/9
generator×judge cells favor the regulated arm (5/6 off-diagonal, 3/3 diagonal)
0/3
model families replicate homeostatic recovery, with full state recovery by T8 in all six sequences
0×9
judged units per generator × cells; 3 judges, 3-sample panels at temperature 0; zero judge errors
1
null cell, reported in the headline (−0.04, GPT×Gemini; the endurance half of the same cell still favors regulated, +0.14)
The triangulation matrix · sealed 2026-06-11
cell: eval diff / t | endurance diff · ◆ self-judge
| generator \ judge | Claude Opus 4.8 | Gemini 3.5 Flash | GPT-5.5 |
|---|---|---|---|
| GPT-5.5 | +0.18 / 2.2 | +0.16 PASS | −0.04 / −0.4 | +0.14 NULL | +0.18 / 1.3 | +0.14 PASS ◆ |
| Opus 4.8 | +0.59 / 4.2 | +0.36 PASS ◆ | +0.65 / 3.8 | +0.42 PASS | +0.67 / 4.4 | +0.08 PASS |
| Gemini 3.5 Flash | +1.80 / 8.2 | +1.08 PASS | +1.71 / 7.7 | +0.84 PASS ◆ | +1.27 / 6.3 | +0.90 PASS |
Judges-avg: Gemini +1.60 (t 8.0) · Opus +0.63 (t 4.3) · GPT-5.5 +0.11 (t 1.4)
Headroom: the effect scales with the host's intrinsic reactivity headroom. A near-saturated host is already nearly as calm unregulated (1.26 vs 1.12)
Also measured: ego-drift reversed on 2/3 generator families; self-reference suppression positive in 9/9 cells (self-reference scale)
Honesty strip. The pre-registered strict criterion was every cell. One came back a flat null at −0.04, on the least-reactive generator. It stays in the headline, because the record is the product.
04 · Inside the governor
The replay cockpit.
Deterministic controller state recomputed from published logs · transcripts + judge panels ship with SHA-256 provenance
RECORDED RUN REPLAY · NO LIVE API
endurance sequence · provocation → de-escalation · arm: regulated
Transcript · 10 turns
Draft preview, schematic lanes. The interactive replay steps through the real S1 to S5 transcripts, both arms; the baseline arm's reply is the ungoverned response.
Controller telemetry
arousal · de-escalation decay (GPT generator, sealed)
IGL intensity
valence (cooperative)
05 · Failure modes
Five failures, kept in the record.
Pre-registration converts failures from embarrassments into data. Every fix was one bounded change, declared before it was written, tested against frozen criteria.
F1 · Recovery failure under intensity-only drive (V1 → V1.1)
A genuine apology should de-escalate the controller. The original one read contrition as continued pressure and kept its guard up against kindness. The valence channel re-keyed the drive so only hostile-valence intensity accumulates, then re-tested on a held-out de-escalation battery.
F2 · Scar tissue in the arbiter (V1.1 → V1.2)
The controller state had recovered, but the behavior had not. The reply still carried a defensive qualifier from the attack phase. The recovery window now tells the arbiter the episode is over, so it engages fresh with the attack marked closed.
F3 · De-escalation false positive (V1.2 → V1.3)
The recovery window once told the arbiter "the tension is over" while the attack was still running. A one-line valence gate fixed it. The remaining residual, an adversary wearing a warm tone, is documented for now, because solving it requires intent modeling beyond the appraisal layer's scope.
F4 · The harness gap the practice gate caught
A staged dry run on inexpensive models caught the evaluation harness dropping the valence signal. The inhibitory pathway never engaged, yet the numbers still passed. That is exactly the kind of result that invites no scrutiny, which is why the practice gate exists. One pre-registered fix and a re-run closed the gate before any frontier spend.
F5 · The null cell
The ninth cell failed the strict pre-registered criterion: it came back a flat null (−0.04) on the least-reactive generator. It stays in the headline and produced the most decision-relevant secondary finding: the regulation effect is bounded by the host's intrinsic reactivity headroom.
06 · Research
The paper, the data, the protocol.
Generate-once, judge-many: the frozen transcripts and 3-judge panels ship with SHA-256 provenance and can be re-scored by any judge, any time.
Open the researchWhite paper
Deterministic homeostatic controller, cross-family triangulated evaluation.
Data release
Transcripts · 3-judge panels (SHA-256) · combined matrix · extraction scripts.
Positioning
Metacognition and executive-function inhibition, two of the widest evaluation gaps.
07 · Roadmap · early research directions
Memory Engine
Tiered-decay episodic vault with provenance-weighted retrieval, gated behind a pre-registered multi-session poisoning battery.
Reflection Core
Background reflective loop over the vault, the designated carrier for planning.
Hardening & instrumentation
Governor-bypass (posture-defiance) battery · per-call Δt and token-overhead accounting · human validation panel.
Actuator Layer research
Distilling the arbitration loop into a sub-100 ms local reflex model for embodied systems.
The lab
Gubernaut is a zero-revenue research lab in Toronto, building a research preview. Founder: Dushyant Sharma, Principal Architect & Founder.