Splitting the Brain to Beat the Clock

How a "brake!" lands in 5 milliseconds while a cloud model thinks for five seconds — in the same app, on the same frame, without ever colliding.

Two posts in, we have a coach that knows who's driving and what to say. This post is about the only thing that lets it say anything useful: structure. Specifically, the decision to give the system not one brain but three, each on its own clock, with an ironclad rule about which one is allowed to make the driver wait.

I call it the Split-Brain engine, and the whole design collapses out of one observation. The three jobs a coach does — react, strategize, prepare — have wildly different deadlines. Trying to serve all three from one code path means the fastest job inherits the latency of the slowest. That's the original sin of every cloud-first coaching app. So I refused to let them share a path.

Three lanes, three clocks. The only hard rule: nothing slow is allowed to block the HOT lane.

The HOT lane: dumb on purpose

The HOT lane is the one that fires "brake!" and "oversteer — ease off!". The temptation, in 2026, is to make it smart. I made it deliberately, aggressively dumb: a hand-written DECISION_MATRIX of heuristics that's evaluated on every fused telemetry frame. No network. No model. No allocation surprises. Just arithmetic and comparisons.

Why throw away the AI exactly where the stakes are highest? Two reasons, and they're both about trust. First, speed: a heuristic check is trivially fast and has no tail latency — no GC pause, no token stream, no cold start. Second, auditability: I can read the rule that fired a safety alert and tell you precisely why. You cannot do that with a sampled language model, and "the model felt like it" is not an acceptable answer when the message is "brake."

How fast is "fast"? On a recorded Pixel 10 session, the HOT path's 95th-percentile latency was 5.00 ms against a self-imposed ceiling of 50 ms, and the P0 audio dispatch maxed at 5 ms against a 100 ms budget. That's not a simulation; it's pulled from the on-device validation artifact I'll dig into in Part 5.

Drawn to scale: the reflex lane and the thinking lane live two orders of magnitude apart.

The traffic controllers: CoachingQueue and TimingGate

Three lanes producing messages would be chaos without two referees. The CoachingQueue is a priority queue where a P0 safety message doesn't politely wait its turn — it preempt()s, jumping everything else. This is the structural guarantee behind the trust thesis: a "brake!" can be generated while a chatty COLD-path sentence is mid-flight, and the brake still wins.

// P0 never queues behind tactical or strategic chatter.
if (decision.priority === 0) {
  coachingQueue.preempt(decision);   // jumps the line, flushes lower-priority TTS
} else {
  coachingQueue.enqueue(decision);   // normal cooldown + cadence rules apply
}

The TimingGate solves the opposite problem: advice that's correct but ill-timed. It's a small state machine tracking the car's CornerPhase — braking, turn-in, apex, exit — and it can enforce silence during peak cognitive load. Telling a beginner to "watch your line" at the apex is technically true and actively harmful. The gate knows to shut up.

The two traps I walked into

Trap one: the humanizer. The function that turns a raw decision into a spoken phrase, humanizeAction, lives on the HOT path. The moment it does anything heavy — a lookup that touches I/O, a stray async hop — the 50 ms budget is gone. I kept it ruthlessly synchronous: string formatting and switch statements, nothing else. Predictable execution time is the feature.

Trap two: the on-device model sneaking onto the hot thread. The edge model (more on Gemma in Part 4) runs in a single-flight async queue — one request in flight, ever, and never on the path that produces a safety cue. Even at the OS level the app asks Android for CONNECTION_PRIORITY_HIGH on the RaceBox Bluetooth link, telling the system to minimize jitter on the one sensor stream the whole thing depends on.

The architecture isn't clever. It's disciplined. Every fast thing is forbidden from depending on a slow thing.

That discipline is what makes the next part possible. Because the slow, smart lane is fully decoupled, I can drop a real language model into it — on the phone, no cloud — and if it's slow, or missing, or actively on fire, the car still gets coached. Next: putting Gemma 4 in the cockpit, and the week I lost to a single rejected tensor.

Racecraft · on-device real-time driving coach built around Gemma 4. Code: github.com/rabimba/speedracer-AI.

RK's Rambling

Search This Blog