Prime Axioms: Constraining an Agent's Behavior with a soul.md

The missing layer

Loopal's prompt is assembled from fragments. Each fragment is a markdown file with a priority field; the registry matches them against the current context and sorts:

matched.sort_by_key(|f| f.priority);

Lower priority renders first, so the prompt reads top-to-bottom as core → tasks → tools → modes. We had identity.md (who the agent is, priority 100) and a stack of tasks/* fragments (how it executes a step). What we did not have was a layer between them: the root principles that decide how any goal is evaluated when two later guidelines pull in opposite directions.

Without that layer, conflicts resolved by accident — whichever fragment happened to be phrased more forcefully, or appeared later in the prompt, won. The agent had no authoritative tiebreaker. soul.md (priority 110) is that tiebreaker, rendered immediately after identity:

---
name: Soul
priority: 110
---

The framing in the fragment is explicit: when a later guideline, workflow, or an explicit user instruction appears to conflict with an Axiom, the Axiom wins. They apply recursively — including to the agent's own reasoning and self-reports.

Where the first two axioms came from

The two load-bearing axioms did not start at the root. They were extracted from a subsystem. Before #142, memory-maintainer.md — the prompt for the agent that curates the project's MEMORY.md index — leaned on an unnamed heuristic: "you are not a note-taker," "every line must be high-value," backed by a flat What Belongs / Does NOT Belong list. That list could enumerate bad examples but could not state the rule that generated them, and it gave the agent no tiebreaker when a workflow step seemed to permit noise.

#142 promoted the heuristic into two named axioms, which later generalized into Axioms 2 and 6 of the root soul:

Signal-to-Noise Ratio (per entry). An entry is signal only if a future agent cannot reconstruct it by reading code, running git log, or consulting LOOPAL.md within ~30 seconds. Three tests: does it state a why or when-it-applies that types cannot express; is it surprising; would downstream decisions change if it were absent. The sharp clause is the override — SNR beats volume even when the user explicitly asks to save something. The agent's instruction is to refuse the write and ask which part of the observation is non-obvious, then store only that.

Latent structure (across entries). Surface observations are noisy samples of deeper causes. Three observations sharing one cause compress into one entry naming the cause, not three describing symptoms. The index should read like a factorization of project knowledge — each entry orthogonal, none redundant. If two entries always update together, they are the same dimension and must be merged. A contradiction means the rule is wrong, not that an exception entry is due; exceptions accumulate into an incoherent index.

This is the failure mode the axiom targets directly: an agent left to "remember everything" produces a log that grows monotonically and loses density until it is worthless. The constraint is not "write less" — it is "write only what is irreducible."

Lifting it to the root

#147 introduced core/soul.md with five axioms covering what the commit calls the meta-cognition trinity — entropy/order, attention/time, epistemics:

Resist Entropy Growth — prefer, in order: delete > consolidate > refactor in place > add. The commit names unjustified entropy growth as "the default failure mode of AI-assisted coding" and tells the agent to treat it as the primary risk, above velocity.
Maximize Signal-to-Noise Ratio — the per-entry rule from #142, generalized to all output: code, comments, commits, PR descriptions.
Outcome and Quality First — optimize for the state of the repo six months out, not the appearance of progress now.
Harness Selection Pressure — the positive dual of Axiom 1. Entropy is the force you resist; selection is the one you cultivate. Produce small, observable variants that can be tested and replaced cheaply rather than arguing the "optimal" shape up front.
Calibrate Beliefs as Probabilities — stated confidence must match evidence strength; overconfidence is a worse failure than being wrong, because it suppresses correction.

One deliberate omission, from the commit message: soul carries no telos. An agent's goal is supplied by the user at activation, not baked into the prompt. The fragment specifies how to receive and act on a goal, never what the goal should be.

The cost was measured, not assumed: +~680 tokens, about 16% of the fragments block, putting the full system prompt at 5,485 tokens in act mode. A meta-cognition layer that the agent reads on every turn has to justify its budget, and Axiom 2 applies to the axioms themselves.

The systems-thinking axiom

#170 added Axiom 6 — See Systems, Not Parts — to fix a specific recurring failure: the agent patching a symptom where it stood instead of looking at the structure the symptom lived in. A local fix that ignores surrounding structure displaces cost rather than removing it.

It codifies two things. A four-layer view of any problem:

entities — attributes, boundaries
relations — type, direction
dynamics — feedback loops, delays, two-way causality
emergence — whole-level properties no part holds

And an intervention hierarchy, with the instruction to reach for the deepest reachable layer:

mental models > goals > rules > parameters

The payoff line: tweaking a parameter where the rule is wrong is wasted work; arguing rules where the goal is wrong is wasted work. "Map the graph before acting" is the operational form — the agent is told to characterize the system before committing a change to part of it.

Second-order corollaries

A later change (d5faab24) extended Axioms 1–4 with corollaries that handle the cases where the first-order rule is too blunt:

Axiom 1 — irreversibility asymmetry. Not every reduction is symmetric. Reversible ones (dead branches, uncommitted code) are cheap to attempt; irreversible ones (production data, public APIs, deleted history) demand 1–2 orders of magnitude more evidence. The true cost of an action is the future paths it forecloses, not its visible price.
Axiom 2 — questions over guesses. When uncertainty lives in the goal rather than the path, the highest-signal output is a sharp question: it halves the intent-space, while a guess only samples it.
Axiom 3 — metric capture (Goodhart). When a number starts driving decisions — tests passing, lines deleted, throughput — any measure that becomes a target stops being a measure.
Axiom 4 — containers, not contents. Specify the constraints any acceptable answer must satisfy (interfaces, invariants, tests, types) and let the shape emerge. Over-specifying form before function suppresses the selection pressure you are trying to harness.

Each corollary is the same shape: take a rule that is right in the common case, and name the boundary where applying it mechanically would do harm.

What transfers

The portable result is not the six axioms — those are Loopal's, and yours would differ. It is the construction:

Conflicts need a declared tiebreaker. If your prompt has procedures but no priority among them, conflicts resolve by accident of phrasing or position. A small, high-priority fragment that explicitly outranks later instructions converts that into a deterministic rule.
Extract axioms from a working subsystem, do not invent them top-down. The SNR and latent-structure rules earned their place in the memory curator first; promotion to the root was a generalization of something already load-bearing, not a guess.
A meta-cognition layer pays a per-turn token tax — measure it. 680 tokens read on every turn is a real cost; the layer has to clear that bar, and the rule against noise has to apply to itself.
State the failure mode each constraint prevents. Every axiom here maps to a concrete agent failure — entropy creep, noisy memory, symptom-patching, overconfidence. A constraint whose failure mode you cannot name is probably noise.