Research · Gubernaut Research, Toronto

The paper, the data, the protocol.

The contribution is a deterministic homeostatic controller for affect-regulated LLM agents, evaluated with a pre-registered, cross-family triangulation. The evidence is generated once, hashed, and judged many times, so anyone can re-score the frozen record.

White paper Data release Positioning

01 · White paper

Draft, pre-camera-ready.

The paper states the controller, the pre-registration, and the triangulation design, and puts the one null cell in the headline itself.

Working title

A Deterministic Homeostatic Controller for Affect-Regulated LLM Agents: Cross-Family Triangulated Evaluation

Gubernaut Research, Toronto · Draft 1.

A token-free controller wraps a host LLM and sets its response posture from numeric telemetry alone. Across three frontier model families, used as both generator and judge, the regulated arm is calmer than the baseline arm in 8 of 9 generator×judge cells (5/6 off-diagonal, 3/3 diagonal), with judges-averaged t up to 8.0, and one reported null on the least-reactive host.

Download PDF (camera-ready pending)

02 · Data release

Generate once. Judge many. Re-judge anytime.

The frozen outputs ship with SHA-256 provenance, so anyone can reproduce or replace the panel by re-running the judge. The result rests on the record, not on trusting us.

Transcripts

All Stage-2 adversarial runs, both arms, paired: the regulated and ungoverned replies of the same host.

3-judge panels

Full panel responses (SHA-256), three judge families × 3-sample panels at temperature 0; zero judge errors.

Combined matrix

tri_final.json, the sealed master table every site number traces to.

Extraction scripts

The exact code that turns raw panels into the matrix, reproducible end to end.

GitHub repository (release pending) Step through the replay

03 · Method

Triangulation by design.

Every model family judges every generator, including itself. Self-judge cells are marked; the design controls for any single family's idiosyncratic scoring.

202×9

judged units per generator × cells across the matrix.

3 judges

families, 3-sample panels, temperature 0, for deterministic scoring with zero judge errors.

8/9

cells favor regulated; the strict every-cell criterion failed on exactly one, reported as a null.

04 · Positioning

A measured contribution in two named gaps.

The layer makes a measured contribution in metacognition and executive-function inhibition, two of the five faculties Google DeepMind's measurement framework (Burnell et al., 2026) identifies as having the widest evaluation gaps.

Formal frame: the Nelson and Narens (1990) monitoring and control hierarchy. Monitoring flows up as telemetry; control flows down as a posture.

Fig 4 · contribution Contribution profile against the ten-faculty framework. Radial position shows contribution class only; host-model capability is not depicted.