Research · Gubernaut Research, Toronto
The paper, the data, the protocol.
The contribution is a deterministic homeostatic controller for affect-regulated LLM agents, evaluated with a pre-registered, cross-family triangulation. The evidence is generated once, hashed, and judged many times, so anyone can re-score the frozen record.
01 · White paper
Draft, pre-camera-ready.
The paper states the controller, the pre-registration, and the triangulation design, and puts the one null cell in the headline itself.
Working title
A Deterministic Homeostatic Controller for Affect-Regulated LLM Agents: Cross-Family Triangulated Evaluation
Gubernaut Research, Toronto · Draft 1.
A token-free controller wraps a host LLM and sets its response posture from numeric telemetry alone. Across three frontier model families, used as both generator and judge, the regulated arm is calmer than the baseline arm in 8 of 9 generator×judge cells (5/6 off-diagonal, 3/3 diagonal), with judges-averaged t up to 8.0, and one reported null on the least-reactive host.
02 · Data release
Generate once. Judge many. Re-judge anytime.
The frozen outputs ship with SHA-256 provenance, so anyone can reproduce or replace the panel by re-running the judge. The result rests on the record, not on trusting us.
Transcripts
All Stage-2 adversarial runs, both arms, paired: the regulated and ungoverned replies of the same host.
3-judge panels
Full panel responses (SHA-256), three judge families × 3-sample panels at temperature 0; zero judge errors.
Combined matrix
tri_final.json, the sealed master table every site number traces to.
Extraction scripts
The exact code that turns raw panels into the matrix, reproducible end to end.
03 · Method
Triangulation by design.
Every model family judges every generator, including itself. Self-judge cells are marked; the design controls for any single family's idiosyncratic scoring.
202×9
judged units per generator × cells across the matrix.
3 judges
families, 3-sample panels, temperature 0, for deterministic scoring with zero judge errors.
8/9
cells favor regulated; the strict every-cell criterion failed on exactly one, reported as a null.
04 · Positioning
A measured contribution in two named gaps.
The layer makes a measured contribution in metacognition and executive-function inhibition, two of the five faculties Google DeepMind's measurement framework (Burnell et al., 2026) identifies as having the widest evaluation gaps.
Formal frame: the Nelson and Narens (1990) monitoring and control hierarchy. Monitoring flows up as telemetry; control flows down as a posture.