Tech5/20/2026·8 min

Making an Identifier Unbreakable: A 6-Layer Defense Contract

A dotted Google email flowed unchanged into a username, got templated into an org slug, and the API rejected it with 422. The fix wasn't one validation call — it was six layers that each fail closed.

One 422, twenty-five latent bugs

A user signed up through Google OAuth with the address [email protected]. The email local-part, kudin.private, flowed unchanged into users.username. The web onboarding page then built a personal workspace slug by string templating:

const slug = `${user.username}-workspace`; // -> "kudin.private-workspace"
await lightCreateOrganization({ name, slug });

The backend slug rule is ^[a-z0-9]+(-[a-z0-9]+)*$. A dot is not in that set, so the create call returned 422 and onboarding dead-ended. The instance was trivial. The class was not: an audit found 25+ identifier fields with no contract enforcement at all. Any column used as a URL param, an @mention key, or a lookup key was one malformed input away from the same failure.

The write path that produced the bug is worth reading, because it looks defensive:

username := providerUsername
if username == "" {
    username = email            // dotted local-part stored verbatim
}
for i := 0; i < 100; i++ {      // dedupes, never sanitizes
    if _, err := s.GetByUsername(ctx, username); err != nil {
        break
    }
    username = fmt.Sprintf("%s_%d", providerUsername, i)
}

The loop guarantees uniqueness and guarantees nothing about shape. That is the trap with identifiers: uniqueness and validity are separate properties, and code that handles one tends to imply it handles both.

The shape of the contract

Before the layers, the rule itself. An identifier (slug / username / pod_key / handle) is: lowercase ASCII letters and digits, hyphens only between segments, length 2–100, and not in a reserved set. One regex, one length bound, one word list — defined once in backend/pkg/slugkit:

const (
    MinLen = 2
    MaxLen = 100
)
var pattern = regexp.MustCompile(`^[a-z0-9]+(-[a-z0-9]+)*$`)

func Validate(s string) error {
    if len(s) == 0       { return ErrEmpty }
    if len(s) < MinLen   { return ErrTooShort }
    if len(s) > MaxLen   { return ErrTooLong }
    if !pattern.MatchString(s) { return ErrInvalidFormat }
    if IsReserved(s)     { return ErrReserved }
    return nil
}

A single Validate call at the API edge would have stopped this one bug. It would not have stopped the other 25 fields, and it would not survive the next engineer who adds field 26 and forgets the call. So the rule is enforced at six independent layers, each of which fails closed on its own.

Layer 1 — the database CHECK

The authoritative source of truth for stored shape is the column itself:

ALTER TABLE users ADD CONSTRAINT users_username_format
  CHECK (username ~ '^[a-z0-9]+(-[a-z0-9]+)*$'
         AND char_length(username) BETWEEN 2 AND 100)
  NOT VALID;

NOT VALID is load-bearing. The table already held kudin.private-era rows; a validating constraint would reject the migration outright. The pattern is two-phase: NOT VALID enforces the rule on new writes immediately, a backfill rewrites historical rows, then a later migration runs VALIDATE CONSTRAINT to promote it to full enforcement. New columns (channels.slug, api_keys.slug) ship nullable first, get backfilled, then a Phase-4 migration adds NOT NULL + UNIQUE (organization_id, slug).

Because Layer 1 is authoritative for reads, the typed-identifier Scan deliberately does not re-validate inbound DB values — re-running the check at read time would duplicate the DB guarantee and break any window where a constraint is being introduced or relaxed. Reads stay cheap; the boundary that matters is the write.

Layer 2 — the ORM hook, without a DIP violation

A DB CHECK only fires when a write reaches the DB. A db.Create with a bad value still round-trips before failing. Catching it earlier means a GORM hook — but domain models are not allowed to import gorm (that would invert the dependency direction). The resolution is an interface in the domain and a plugin in infra:

type IdentifierValidator interface {
    ValidateIdentifiers() error
}

Each domain model implements it with no gorm import:

func (u *User) ValidateIdentifiers() error {
    return slugkit.ValidateIdentifier("users.username", u.Username)
}

gormvalidate.Plugin registers BeforeCreate / BeforeUpdate callbacks that reflect over the statement, check whether the model (via pointer receiver) implements the interface, and call it. The reflect result is memoised per type in a sync.Map, so join tables and message rows that never carry an identifier pay one lookup, not one per row. Hooks were added to seven models: agent, agentpod, apikey, channel, loop, organization, user.

Layer 3 — registries as the only write path

Validation rejects bad input; it does not produce good input. Turning "Kudin Private" into a unique, valid kudin-private is generation, and that lives in per-table *Registry helpers declared as the sole sanctioned write path for identifier columns:

func (s *Service) EnsureUniqueUsername(ctx context.Context, seeds []string) (string, error) {
    check := slugkit.FromExistsCheck(s.repo.UsernameExists)
    if u, ok := slugkit.TrySeeds(ctx, seeds, check); ok {
        return u, nil
    }
    return randomFallbackUsername(ctx, check) // "user-{8hex}"
}

TrySeeds walks a priority-ordered seed list — provider username, then email local-part, then display name — sanitizing each and retrying with -2 / -3 suffixes on collision, before falling back to a random handle. The OAuth bug's open-coded loop was deleted and every ingress path (Google / GitHub / GitLab / Gitee OAuth, plus SAML, OIDC, LDAP) now funnels through this one helper. Channel and API-key slugs get org-scoped equivalents.

Layer 4 — a newtype the compiler checks

For new fields, the goal is to make "raw string in an identifier column" unrepresentable. slugkit.Slug is a string newtype whose only validating ingress is UnmarshalJSON, so a malformed identifier fails at request parsing rather than deep in a handler:

type Slug string

func (s *Slug) UnmarshalJSON(b []byte) error {
    var raw string
    if err := json.Unmarshal(b, &raw); err != nil {
        return err
    }
    sl, err := NewFromTrusted(raw) // runs Validate
    if err != nil {
        return err
    }
    *s = sl
    return nil
}

It also implements database/sql Scanner / Valuer, so it drops into GORM columns directly. New UNIQUE identifier columns should be typed Slug; raw string columns are explicitly tagged as tech debt that leans on Layers 1–3.

Layer 5 — CI lint for the patterns regex can't type-check

Layers 1–4 stop a bad value. They do not stop an engineer from reintroducing the anti-patternstrings.Split(email, "@")[0] straight into a column, or ${username}-workspace on the client. tools/identifier-lint/lint.sh is six grep-based rules, each targeting a known shape from this incident:

  • splitting an email local-part outside the helper;
  • raw assignment to .Username outside username_registry and the OAuth funnel;
  • concatenating -workspace on the client (Go and TypeScript);
  • a new UNIQUE VARCHAR migration (seq ≥ 000135) without a slug CHECK — with an explicit allowlist for email / hash / token columns that legitimately aren't slugs;
  • interpolating user.username into a route path in the frontend.

CI runs the full scan; a fast mode (IDENT_LINT_FAST=1) scopes to the diff against origin/main for local iteration.

Layer 6 — sanitize at the edge, and split identity from display

The last layer handles input that is legitimately free-form. Channel names and ticket titles hold arbitrary Unicode — they are not slugs — but they are still attack surface for zero-width characters, RTL/LTR overrides, and control bytes. displaykit.Sanitize strips Unicode category Cf (format) and C0/C1 control runes, collapses whitespace, and counts length in runes rather than bytes so CJK and emoji count as one each. Four MCP gRPC adapters call it at the boundary.

This forced a property-model cleanup that was the real root cause: channels and api_keys had used name as both the display string and the lookup key. One column cannot be both free-form Unicode and a strict slug. The fix splits them — name is pure display, a new slug column is the identifier — with GET /channels/by-slug/:slug and GET /apikeys/by-slug/:slug for lookup.

Cost and result

The change spans 132 files: roughly +2,800 / −140 lines, 62 new Go test functions, 6 end-to-end cases. Onboarding is covered twice — once through the web UI (Playwright) and once through the Electron NAPI bridge — because the client write path crosses Rust core → UniFFI/wasm/NAPI → electron-adapter, and a unit test would not catch a break in that chain. The slug rule, length bounds, and reserved set are mirrored in TypeScript so the client rejects bad input before a round-trip.

Two constraints generalize past this codebase:

  • Uniqueness and validity are different properties. A dedup loop that never sanitizes will store anything an attacker or an OAuth provider hands it. Treat them as two checks.
  • One validation call is a patch; a contract is layers that each fail closed. The DB CHECK, ORM hook, registry, newtype, CI lint, and edge sanitizer overlap on purpose. Any single layer is removable without re-opening the hole — which is the property that survives the next engineer who forgets one of them.