Sub-Agent Concurrency Discipline: Depth Limits, a Hub Budget, and Fork Context

The failure mode

Our agent runtime lets any agent spawn sub-agents through an Agent tool, and those sub-agents can spawn their own. The tree is unbounded by construction. Two production traces made the cost concrete: one task produced 129 agents, and a later "deep research" request produced 38. Most of those agents did very little useful work — each one received a vague slice of its parent's task, decided the slice was still too big, and subdivided again. Cascading delegation is exponential when every node subdivides, and the leaf agents end up exploring from zero with no shared context.

The naive fix is a hard recursion cap: refuse to spawn past depth N. We tried the informational version of that first, then replaced it. What shipped is three mechanisms that attack three different drivers of the cascade.

Why a hard depth cap alone isn't enough

The first attempt (#86) propagated depth through the full IPC chain so each agent knew its position in the tree, rendered as {{ agent_depth }} in the system prompt, and removed the existing hard max_depth gate. Depth became informational, not a wall. The reasoning: a hard cap stops recursion but says nothing about breadth. An agent at depth 0 that spawns 5 blind children the instant it receives a task is still pathological, even though it never exceeds depth 1. The prompt was made depth-conditional instead, pushing deeper agents toward direct tool use.

That alone didn't hold. Guidance changes the tendency but not the capability — the model can still spawn against advice, and under load it does. So #100 layered enforcement under the guidance.

Mechanism 1: depth-degraded tools

Past a configured depth, the spawn-related tools physically disappear from the agent's tool set. This is not a refusal at call time; the tools are filtered out of the schema the model sees.

const SPAWN_TOOLS: &[&str] = &["Agent", "SendMessage", "ListHubs"];

pub fn build_depth_tool_filter(depth: u32, max_depth: u32) -> Option<HashSet<String>> {
    if depth < max_depth {
        return None; // spawn tools remain available
    }
    let mut allowed: HashSet<String> = /* all tool names */;
    for name in SPAWN_TOOLS {
        allowed.remove(*name);
    }
    Some(allowed)
}

With agent_max_depth defaulting to 2, depth 0 and 1 can spawn; depth 2 and beyond cannot. The filter returns None (no restriction) in the common case, so there is no allocation on the fast path. The system prompt at depth ≥ 2 is made to match the capability rather than contradict it: "Your spawn capability has been removed at this depth. Execute your task directly with your tools." Removing the tool rather than instructing against it closes the gap between what the model is told and what it can do.

Mechanism 2: an atomic hub budget

Depth bounds the tree's height. It does nothing about width: a depth-1 agent can still spawn many siblings, and so can each of them, up to the depth limit. The hub — the process that owns every agent connection — enforces a single global cap on total sub-agents.

agent_max_total defaults to 16 and counts only spawned children, excluding the root:

pub fn sub_agent_count(&self) -> usize {
    self.agents.values()
        .filter(|a| a.info.parent.is_some())
        .count()
}

The check has to be correct under concurrency, because multiple Agent calls in one turn run in parallel and arrive at the hub concurrently. Two places guard it. A pre-check runs before forking the OS process, so an over-budget request fails fast without leaving an orphan. The authoritative check then runs inside the same lock that registers the agent, so there is no TOCTOU window between counting and registering:

// Budget check (atomic with registration — no TOCTOU).
if parent.is_some() {
    let sub_count = h.registry.sub_agent_count();
    if sub_count >= h.max_total_agents as usize {
        return Err(format!(
            "Spawn budget exhausted ({sub_count}/{} sub-agents). \
             Complete the task with your own tools.",
            h.max_total_agents
        ));
    }
}

The error is returned to the calling agent as a tool result, not raised as a fault — the agent reads "budget exhausted, complete the task with your own tools" and proceeds. If process spawn succeeds but registration later fails, the orphan process is killed rather than leaked. The budget lives on the hub rather than per-agent on purpose: each agent only knows its own children, so a per-agent limit can't bound the global total. Only the shared registry sees the whole tree.

Mechanism 3: fork context

Depth and budget cap the cascade. Fork context attacks its motivation. A sub-agent spawned with nothing but a one-line task has no choice but to explore from zero, and exploration is exactly what tempts it to delegate again. Forking the parent's conversation into the child removes that pressure: the child starts already knowing what the parent knows.

The parent conversation is compressed before injection — it is background, not the child's own working set:

const FORK_MAX_TOKENS: u32 = 50_000;
const TOOL_RESULT_CAP: usize = 200;

pub fn compress_for_fork(messages: &[Message]) -> Vec<Message> {
    // strip Thinking / Image / ServerTool blocks (ephemeral, not portable)
    // truncate each tool result to TOOL_RESULT_CAP chars + "…[truncated]"
    // drop a trailing incomplete assistant turn (a dangling tool_use)
    // drop oldest messages until under FORK_MAX_TOKENS
    // then ensure the result starts with a User message (API requirement)
}

Thinking, image, and server-tool blocks are stripped — they don't transfer meaningfully to another process. Tool results are capped at 200 characters each, because the child needs the shape of what the parent found, not full dumps. A trailing assistant turn that ends in a tool_use is dropped, since the matching tool result doesn't exist yet. The whole thing is trimmed to 50k tokens, oldest-first, and then the head is advanced to the first User message so the transcript is API-valid.

A boilerplate is prepended to the forked child's first message to set its contract explicitly:

STOP. READ THIS FIRST.

You are a forked worker process. The conversation above is background
context from your parent agent.

RULES:
1. Do NOT spawn sub-agents — execute directly with your tools.
2. Stay within your assigned scope.
3. Use tools silently, then report findings once at the end.
4. Keep your report under 500 words. Be factual and concise.
5. Your response MUST begin with "Scope:" — no preamble.

Three independent constraints reinforce each other here: the prompt says don't spawn, the depth filter eventually removes the ability to spawn, and the inherited context removes the reason to spawn.

Where the controls sit

The three mechanisms deliberately live at different layers, so no single bypass defeats all of them.

Mechanism	Bounds	Layer	Enforcement
Depth-degraded tools	tree height	per-agent tool filter	hard — tool absent from schema
Hub budget (16)	total agents	hub registry, under lock	hard — atomic check at register
Fork context	motivation to recurse	injected conversation	soft — removes the incentive
Depth-conditional prompt	breadth at each depth	system prompt	soft — guidance only

Guidance and enforcement are not redundant. Guidance shapes behavior in the common case so the hard limits are rarely hit; the hard limits exist for when guidance is ignored. The defaults — depth 2, 16 total — are config values (agent_max_depth, agent_max_total), not constants, so an operator can widen them for a genuinely large fan-out without code changes.

Takeaway

When agents can spawn agents, soft guidance bounds the tendency to recurse but not the capability, and under load the model exercises the capability. If a limit must hold, enforce it where the shared state lives — depth in the per-agent tool filter, total count in the hub registry under the same lock that registers — and make the prompt match what the tools actually permit. Then remove the incentive that drove the recursion in the first place: an agent that inherits its parent's context has far less reason to explore from zero, and far less reason to delegate the exploration away.