Driving iOS i18n Hardcode Leaks to Zero and Keeping Them There

The problem

A hardcoded UI string is a string that reaches the screen without going through the localization system. In our codebase the contract is Text(L10n.Onboarding.Welcome.titleLine1); a leak is Text("看懂世界") shipped verbatim. Leaks do not fail the build. They fail silently for every user whose locale is not the one the literal happens to be written in.

The numbers were not small. A first full pass over MenuLens surfaced 247 such leaks across onboarding, paywall, and result screens. YoloShell had 24. Counting them by hand is not a strategy, and neither is trusting that a code review will catch the next one. We needed a way to enumerate leaks mechanically, drive the count to zero, and then make a regression cost the author a red pipeline instead of costing a user a broken screen.

Why a single check is not enough

Localization breaks in three independent places, so one detector cannot cover it:

A developer writes Text("Start") directly instead of routing through L10n. This is visible in source.
The string is routed through L10n, but the CSV has no translation for some locale, so that locale silently falls back to the base language. This is invisible in source and invisible at runtime unless you switch to that exact locale.
The string is routed and translated, but a layout, a formatter, or a conditional branch means it never actually renders the localized value. This is invisible until something draws pixels.

The toolchain — a Rust crate at DevOps/Rust/i18n-audit/, exposed through a dev i18n <subcommand> CLI — answers all three with three separate paths.

Path 1: static source scan

dev i18n check-source walks a module's Swift files and flags string literals that look like user-facing copy. The classification is deliberately conservative:

A literal containing CJK ideographs or kana is a ChineseLiteral — almost always real UI copy that leaked.
A literal that looks like English UI copy — two or more words, ASCII letters and punctuation only, no format placeholders, not a camelCase identifier — is an EnglishLiteral.
Everything else (identifiers, asset names, URLs, regex, key parameters) is ignored.

False positives are suppressed by skipping whole files (Tests/, Mocks/, Logger.swift, the generated L10n.swift) and by line markers like logger., NSLog(, systemName:, and forKey:. The trade-off is explicit: a wider net would catch more leaks but bury the signal under sample data and diagnostic strings, so the scanner stays narrow and the per-product noise knobs (extra_skip_markers, file prefix/suffix excludes) are checked in alongside each app.

There is one case the scanner must not flag: a string that is hardcoded on purpose. The brand line Snap · Translate · Recommend is the same in every language. The fix there is not an L10n key — it is Text(verbatim: "Snap · Translate · Recommend"). The verbatim: label is the author asserting "this literal is intentional," and the MenuLens cleanup used it nine times. Intent becomes part of the code rather than a comment the scanner has to guess at.

Path 2: static CSV coverage

dev i18n check-csv validates the localization CSV without launching anything. It reports three classes of issue per run:

Fill rate per locale — what fraction of keys have a non-empty translation. The default gate is 95 percent; the tool flags any locale below it and renders the worst offenders first so they are fixed first.
Duplicate keys — two rows claiming the same key.
Format specifier consistency — every locale's placeholders must match the en reference. A German string that drops a {pct} the English one declares is surfaced as a mismatch.

The specifier check is style-aware. Printf-shaped fragments (%@, %d) and brace placeholders ({name}) are compared as independent halves, because a literal % followed by a letter in a translation is not a printf spec and should not be treated as one.

The coverage targets are real. As of the cleanup:

MenuLens — 42 locales × 381 keys, every locale at 100 percent.
SnapSweep — 20 per-page CSV files, 285 key-rows, 42 locales, all at 100 percent.
YoloShell — 11 locales × 164 keys, lowest locale at 99.4 percent, all above the 95 percent gate.

Path 3: pseudo-localization OCR audit

The first two paths are static. They cannot prove a string that is correctly wired actually shows up localized. Path 3 closes that gap by driving the simulator.

The mechanism is a pseudo-locale. The csv2strings-rs --emit-pseudo build flag generates an xa-XA.lproj where every localized value is replaced with its key path wrapped in corner brackets: 《Onboarding.Welcome.titleLine1》. dev i18n audit switches the simulator to this locale, installs the build, navigates each page, screenshots it, and runs the screenshot through macOS Vision OCR. Every recognized token is classified:

A dotted identifier like Onboarding.Welcome.title is a Sentinel — proof the localization system reached that label.
Wall-clock times, percentages, SI units, emoji, Apple chrome (Back, Cancel, weekday and month names), and OCR noise are AllowedSystem.
Anything else legible is a SuspectedLeak — a string that rendered as real words instead of a sentinel, which means it bypassed L10n.

A page that is fully localized produces only sentinels and allowed-system tokens. A leak stands out because it is the one legible English or Chinese phrase in a sea of 《...》.

Two implementation details matter. First, xa-XA is emitted only in debug builds. It is gated by compilation_mode in localization.bzl and must be absent from release and App Store builds, because xa-XA is a UN M.49 region code, not BCP 47, and shipping it triggers an ITMS-90176 rejection. Second, the set of pages to visit is a declarative AuditPages.json per product — slug, route, navigation type, marker — which replaced a 148-line bash script. Adding a page to the audit is now a JSON edit, not a script change.

Keeping it at zero

Reaching zero once is the easy half. The harder half is the toolchain itself being trustworthy enough to gate on, because a flaky detector that cries wolf gets disabled within a week. The classifier carries 107 unit tests — 39 for the sentinel grammar alone, 21 for the source scanner — covering the OCR residue, the proper-noun exemptions (crypto algorithm names, OSS license identifiers, terminal themes, modifier-key combos), and the format-specifier edge cases. The exemption lists are kept tight and each entry is documented, because a permissive allow-list silently swallows real leaks.

The exit code is the contract. check-csv, check-source, and the aggregate dev i18n status run in the MR pipeline; a fill rate under the gate or a new source candidate fails the job. The leak count is a number the author has to look at before merge, not a number a user discovers after release.

Takeaways

Localization fails in three independent places — untranslated source, missing CSV entries, and strings that never render localized. One detector cannot cover all three; build a static source scan, a static CSV coverage check, and a runtime pseudo-localization audit, and run all three.
Encode intent in code, not comments. Text(verbatim:) tells the scanner a literal is deliberate, so the gate can be strict without false positives.
A pseudo-locale that replaces values with their key paths turns "is this localized" into a visual yes/no — but gate it by build mode so xa-XA never ships.
A linter you gate CI on must earn that trust with tests. Tight, documented exemptions and a hundred-plus cases are what let the exit code be the contract.