Skip to main content

Splitting the Brain to Beat the Clock

Racecraft · Part 3 of 5 · ← Prologue

Splitting the Brain to Beat the Clock

How a "brake!" lands in 5 milliseconds while a cloud model thinks for five seconds — in the same app, on the same frame, without ever colliding.

Two posts in, we have a coach that knows who's driving and what to say. This post is about the only thing that lets it say anything useful: structure. Specifically, the decision to give the system not one brain but three, each on its own clock, with an ironclad rule about which one is allowed to make the driver wait.

I call it the Split-Brain engine, and the whole design collapses out of one observation. The three jobs a coach does — react, strategize, prepare — have wildly different deadlines. Trying to serve all three from one code path means the fastest job inherits the latency of the slowest. That's the original sin of every cloud-first coaching app. So I refused to let them share a path.

The Split-Brain engine Every telemetry frame fans out to three lanes against a 300–500 ms budget. The loop never waits on the slow ones. TelemetryRaceBox BLEGPS · IMU · camera~50–100 Hz fused HOT < 50 ms · deterministicDECISION_MATRIX heuristics · P0 safety + P1 tacticalno network · no LLM · measured 5 ms p95 on Pixel 10 COLD 2–5 s · cloudGemini 2.5 Flash Lite · "why, not what" strategyasync · the loop does not block on it FEEDFORWARD · geofencedGPS-triggered, velocity-scaled, ~150 m before a corner"T7 right: late apex, brake at the 100 m board" CoachingQueuepriority queueP0 preempts allTimingGate gates by phase audiospoken Design principle"Feedback 800 ms late is worse than silence." — so the safety lane is deterministic and never shares a thread with a model.
Three lanes, three clocks. The only hard rule: nothing slow is allowed to block the HOT lane.

The HOT lane: dumb on purpose

The HOT lane is the one that fires "brake!" and "oversteer — ease off!". The temptation, in 2026, is to make it smart. I made it deliberately, aggressively dumb: a hand-written DECISION_MATRIX of heuristics that's evaluated on every fused telemetry frame. No network. No model. No allocation surprises. Just arithmetic and comparisons.

Why throw away the AI exactly where the stakes are highest? Two reasons, and they're both about trust. First, speed: a heuristic check is trivially fast and has no tail latency — no GC pause, no token stream, no cold start. Second, auditability: I can read the rule that fired a safety alert and tell you precisely why. You cannot do that with a sampled language model, and "the model felt like it" is not an acceptable answer when the message is "brake."

How fast is "fast"? On a recorded Pixel 10 session, the HOT path's 95th-percentile latency was 5.00 ms against a self-imposed ceiling of 50 ms, and the P0 audio dispatch maxed at 5 ms against a 100 ms budget. That's not a simulation; it's pulled from the on-device validation artifact I'll dig into in Part 5.

The latency budget, to scale Where each lane lands against the 300–500 ms perception window. Numbers are real where measured. 5 ms50 ms500 ms5 s 300–500 ms perception window — feedback must land here HOT · ~5 ms p95 (measured, Pixel 10) HOT budget ceiling = 50 ms COLD · Gemini 2.5 Flash Lite · 2–5 s — runs off the critical path The HOT lane finishes ~10× inside its own ceiling and ~60–100× faster than the cloud lane. That gap is the whole reason safety never shares a code path with a model. The COLD lane is slow because nobody waits for it.
Drawn to scale: the reflex lane and the thinking lane live two orders of magnitude apart.

The traffic controllers: CoachingQueue and TimingGate

Three lanes producing messages would be chaos without two referees. The CoachingQueue is a priority queue where a P0 safety message doesn't politely wait its turn — it preempt()s, jumping everything else. This is the structural guarantee behind the trust thesis: a "brake!" can be generated while a chatty COLD-path sentence is mid-flight, and the brake still wins.

// P0 never queues behind tactical or strategic chatter.
if (decision.priority === 0) {
  coachingQueue.preempt(decision);   // jumps the line, flushes lower-priority TTS
} else {
  coachingQueue.enqueue(decision);   // normal cooldown + cadence rules apply
}

The TimingGate solves the opposite problem: advice that's correct but ill-timed. It's a small state machine tracking the car's CornerPhase — braking, turn-in, apex, exit — and it can enforce silence during peak cognitive load. Telling a beginner to "watch your line" at the apex is technically true and actively harmful. The gate knows to shut up.

The two traps I walked into

Trap one: the humanizer. The function that turns a raw decision into a spoken phrase, humanizeAction, lives on the HOT path. The moment it does anything heavy — a lookup that touches I/O, a stray async hop — the 50 ms budget is gone. I kept it ruthlessly synchronous: string formatting and switch statements, nothing else. Predictable execution time is the feature.

Trap two: the on-device model sneaking onto the hot thread. The edge model (more on Gemma in Part 4) runs in a single-flight async queue — one request in flight, ever, and never on the path that produces a safety cue. Even at the OS level the app asks Android for CONNECTION_PRIORITY_HIGH on the RaceBox Bluetooth link, telling the system to minimize jitter on the one sensor stream the whole thing depends on.

The architecture isn't clever. It's disciplined. Every fast thing is forbidden from depending on a slow thing.

That discipline is what makes the next part possible. Because the slow, smart lane is fully decoupled, I can drop a real language model into it — on the phone, no cloud — and if it's slow, or missing, or actively on fire, the car still gets coached. Next: putting Gemma 4 in the cockpit, and the week I lost to a single rejected tensor.

Racecraft · on-device real-time driving coach built around Gemma 4. Code: github.com/rabimba/speedracer-AI.

Comments

Popular posts from this blog

Deep Dive into the Google Agent Development Kit (ADK): Features and Code Examples

In our previous overview, we introduced the Google Agent Development Kit (ADK) as a powerful Python framework for building sophisticated AI agents. Now, let's dive deeper into some of the specific features that make ADK a compelling choice for developers looking to create agents that can reason, plan, use tools, and interact effectively with the world. 1. The Core: Configuring the `LlmAgent` The heart of most ADK applications is the LlmAgent (aliased as Agent for convenience). This agent uses a Large Language Model (LLM) for its core reasoning and decision-making. Configuring it effectively is key: name (str): A unique identifier for your agent within the application. model (str | BaseLlm): Specify the LLM to use. You can provide a model name string (like 'gemini-1.5-flash') or an instance of a model class (e.g., Gemini() ). ADK resolves string names using its registry. instruction (str | Callable): This is crucial for guiding the agent's be...

Build Smarter AI Agents Faster: Introducing the Google Agent Development Kit (ADK)

The world is buzzing about AI agents – intelligent entities that can understand goals, make plans, use tools, and interact with the world to get things done. But building truly capable agents that go beyond simple chatbots can be complex. You need to handle Large Language Model (LLM) interactions, manage conversation state, give the agent access to tools (like APIs or code execution), orchestrate complex workflows, and much more. Introducing the Google Agent Development Kit (ADK) , a comprehensive Python framework from Google designed to significantly simplify the process of building, testing, deploying, and managing sophisticated AI agents. Whether you're building a customer service assistant that interacts with your internal APIs, a research agent that can browse the web and summarize findings, or a home automation hub, ADK provides the building blocks you need. Core Concepts: What Makes ADK Tick? ADK is built around several key concepts that make agent development more s...

Curious case of Cisco AnyConnect and WSL2

One thing Covid has taught me is the importance of VPN. Also one other thing COVID has taught me while I work from home  is that your Windows Machine can be brilliant  as long as you have WSL2 configured in it. So imagine my dismay when I realized I cannot access my University resources while being inside the University provided VPN client. Both of the institutions I have affiliation with, requires me to use VPN software which messes up WSL2 configuration (which of course I realized at 1:30 AM). Don't get me wrong, I have faced this multiple times last two years (when I was stuck in India), and mostly I have been lazy and bypassed the actual problem by side-stepping with my not-so-noble  alternatives, which mostly include one of the following: Connect to a physical machine exposed to the internet and do an ssh tunnel from there (not so reliable since this is my actual box sitting at lab desk, also not secure enough) Create a poor man's socks proxy in that same box to have...