From Hard Deny to Permission-Gated: Reworking an Agent Sandbox

The sandbox blocked work the user had already approved

The coding agent runs tools under a sandbox: an OS-level layer (Seatbelt on macOS, bubblewrap on Linux) plus an app-level path checker. The original model treated any path outside the working directory as a hard error. A write to /etc/nginx/nginx.conf came back as PermissionDenied and the operation stopped there — even when the user had already granted the tool permission to run.

That is the wrong failure mode for an agent meant to do ops work. The user approving Write to a config file outside cwd is not a policy violation to reject; it is a decision the human is entitled to make. The sandbox was overriding the human, not assisting them.

Two changes fixed this: collapsing the path decision into three states with an explicit "needs approval" branch (#80 / #83), then removing the OS-level write restrictions entirely and moving fine-grained protection to the app layer (#101).

Three states instead of two

The path checker used to return distinct hard-deny variants — DenyWrite and DenyRead — both of which mapped to PermissionDenied and ended the operation. The new decision type has a third branch:

pub enum PathDecision {
    Allow,
    /// Hard deny — cannot be overridden (ReadOnly mode, path resolution failure).
    Deny(String),
    /// Soft deny — outside normal sandbox bounds but approvable through the
    /// permission system: Bypass auto-allows; under AskAnyWrite / AskDangerous
    /// the handler chain (Manual or Auto) decides.
    RequiresApproval(String),
}

The distinction is which violations are negotiable. Writing outside the writable set, hitting a deny_write_globs entry, or hitting a deny_read_globs entry are now soft denies — RequiresApproval. What stays a hard Deny: read-only mode, and path resolution failures (.. traversal, symlink escape). Those are not user decisions; a symlink that escapes the sandbox root is a containment failure regardless of intent.

A matching RequiresApproval variant was added to the I/O error type so the runtime can tell "approvable" apart from "blocked":

/// Distinguished from PermissionDenied (hard block) so the runtime can route
/// it through the permission system.
#[error("requires approval: {0}")]
RequiresApproval(String),

Routing through the existing permission system

The agent already has a permission model for tools, independent of the sandbox:

PermissionLevel: ReadOnly / Write / Dangerous
PermissionMode: Bypass / AskDangerous / AskAnyWrite
PermissionMode::check(level) -> Allow | Ask | Deny

The goal was to reuse that pipeline rather than build a second approval path. Before a tool runs, a precheck phase extracts the paths the tool will touch from its input, keyed by tool name:

pub fn extract_paths(tool_name: &str, input: &Value) -> Vec<(String, bool)> {
    match tool_name {
        "Write" | "Edit" | "MultiEdit" => single(input, "file_path", true),
        "Read"                          => single(input, "file_path", false),
        "Delete"                        => single(input, "path", true),
        "MoveFile"                      => /* src + dst, both writes */,
        "CopyFile"                      => /* src read, dst write */,
        "ApplyPatch"                    => patch_paths(input),
        _ => Vec::new(), // opaque tools handled at execution time
    }
}

If any extracted path comes back RequiresApproval, the precheck elevates the tool's effective permission to Dangerous for this invocation:

let effective_perm = if sandbox_needs.is_empty() {
    tool_perm
} else {
    Some(PermissionLevel::Dangerous)
};
let decision = effective_perm
    .map(|p| self.params.config.permission_mode.check(p))
    .unwrap_or(PermissionDecision::Allow);

From there the existing modes decide. Under Bypass the paths are approved without a prompt. Under AskAnyWrite / AskDangerous the decision flows to the human or to the auto-classifier, and the sandbox reason is attached to the prompt (sandbox_approval_reason) so the human sees why approval is being asked, not just that it is.

Caching approvals for the session

Asking once per file is acceptable; asking once per operation on the same file is not. Approved paths are cached for the session in a reader-writer set inside the backend:

pub struct ApprovedPaths {
    inner: RwLock<HashSet<PathBuf>>,
}

RwLock fits the access pattern — writes are rare (first approval of a path), reads are frequent (every subsequent check) — and lets the set live inside an Arc<LocalBackend> without &mut self. Once a path is in the set, check_sandbox_path returns clean.

Path extraction can't cover everything. Bash, Glob, Grep, and MCP tools have opaque path semantics — you can't statically read the files a shell command will touch. For those, extract_paths returns empty and an execution-time fallback in the backend handles it: when an I/O call returns RequiresApproval, it re-checks the approved set before propagating the error.

match path::resolve(self.cwd.as_path(), raw, is_write, self.policy.as_ref()) {
    Ok(p) => Ok(p),
    Err(ToolIoError::RequiresApproval(reason)) => {
        let abs = path::to_absolute(self.cwd.as_path(), raw);
        if self.approved.contains(&abs) {
            Ok(abs.canonicalize().unwrap_or(abs))
        } else {
            Err(ToolIoError::RequiresApproval(reason))
        }
    }
    // ...
}

Dropping the OS-level write restriction

The first change made path violations approvable but left the OS sandbox enforcing workspace containment for writes. That layer was doing more harm than good. The reasoning, from #101:

The seatbelt/bwrap file-write restrictions added no real security for Bash commands (process-exec was already unrestricted) but broke every CLI tool that writes to $HOME config dirs (lark-cli, npm, cargo, etc.).

The security argument is the load-bearing part. If process-exec and process-fork are unrestricted — and they are, because restricting them breaks Bazel, the JVM, Nix toolchains — then a write restriction is trivially bypassable: spawn a static helper binary outside the sandbox and have it do the write. The restriction was paying a real compatibility cost (every tool that touches ~/.npm, ~/.cargo, ~/.config failed) for a boundary that did not actually hold.

So WorkspaceWrite became DefaultWrite (with a serde alias for config compatibility). In that mode the OS sandbox appends (allow file-write*) — all writes pass at the OS level. ReadOnly mode still omits the write rules and blocks everything. Protection for File tools moved to the app-level path checker, which now treats $HOME as writable by default but guards specific files via deny_write_globs:

// Added to the default deny-write globs
"**/.bashrc", "**/.bash_profile", "**/.zshrc", "**/.zprofile", "**/.profile",
"**/authorized_keys", "**/LaunchAgents/**",

A write to ~/.npm/_cacheinfo passes silently; a write to ~/.zshrc or ~/.ssh/authorized_keys becomes RequiresApproval and surfaces to the user. The protection is now where it can actually be enforced and where the reason can be explained, instead of in a kernel layer that any subprocess could route around.

What carries over

A few constraints generalize past this codebase.

A sandbox decision is not binary. "Outside the default boundary" and "must never happen" are different claims, and folding them into one hard error means the safe-but-unusual case pays the same price as the genuinely dangerous one. Splitting Deny from RequiresApproval is what let the unusual case proceed under human control.

Enforce a boundary at the layer that can actually hold it. An OS write restriction underneath an unrestricted exec is theater — it blocks the honest path and leaves the dishonest one open, while breaking real tools. Pushing the check up to the application layer lost no security that was real and gained an explanation the user can act on.

Reuse the approval pipeline you already have. The sandbox didn't get its own prompt UX, classifier, or caching. It mapped its one new state onto Dangerous and let the existing permission machinery decide. The change landed in #80/#83 with 35 new tests across the backend, sandbox, and runtime layers — small, because most of the behavior was already there.