Skip to main content

The 800-Millisecond Problem

Racecraft · Part 1 of 5 · ← Prologue

The 800-Millisecond Problem

Why I stopped trusting every "AI race coach" I'd ever tried , and what it would take to build one I'd actually listen to at 130 km/h.

The first time an app yelled "Brake!" at me on a track day, I'd already braked — not by a hair, but by a full corner. I was unwinding the wheel and feeding in throttle on the exit of Sonoma's Turn 7, one of the fastest corners on the circuit, when the phone told me — with great confidence — to brake. It wasn't merely lagging by a beat; for that specific high-speed corner the cue was flat wrong, because braking mid-exit at that speed is how you put the car into a spin. And I couldn't even mute it. I was a beginner on a mandatory-instructor day, both hands on the wheel, eyes up. It was the instructor in my passenger seat — there to call flags and lines, allowed to say anything but never to touch a single control — who reached over and silenced the app, precisely because its feedback was both lagging and incorrect for that corner. That muted phone is the whole reason this project exists.

Reaching for it myself was never an option. At 130 km/h on a shared track, every other driver is trusting me to do the predictable thing in the predictable place. A late, wrong cue isn't merely useless , it's a hand off the wheel and a half-second of attention I do not have. On a spreadsheet that's a bad notification; on a race track it's a safety problem. (It didn't help that the app was our own prototype , the one my team was building, with me strapped in as both the engineer and the crash-test dummy.)

Here's the thing every engineer who has sat in a car with a coach knows in their gut: a coach you can't trust is worse than no coach at all. That instructor beside me is the model for the whole project , a good one earns trust in the first two laps or loses it forever. They say the right thing, at the right moment, in a voice that matches how rattled you are, and crucially they can't grab the wheel; all they have is words, timed well. Get the timing wrong and you don't just give bad advice , you teach the driver to ignore you. The mute button is permanent.

So when I set out to build Racecraft , an on-device, real-time driving coach , the success metric wasn't accuracy. It was trust. And trust, it turns out, is mostly a latency problem wearing a psychology costume.

Every day: the coach that's always a beat behind

Pull apart the typical "AI coaching" app and you find the same architecture every time: stream telemetry to the cloud, ask a large model what it thinks, wait for the answer, speak it. On a spreadsheet that's a pipeline. In a moving car it's a comedy of timing. By the time the round-trip and the token generation are done, the apex you were being coached on is in your mirrors.

I measured a few of these out of morbid curiosity. The good ones landed advice 1.5–3 seconds after the triggering event. The model was often right , "you braked too early for that one" , and completely useless, because being told about a mistake two corners later doesn't change your hands, it just makes you feel watched.

"Feedback 800 ms late is worse than silence."

That sentence became the constitution of the project. Not a slogan , a constraint that kills features. If a clever idea can't make it into your ears inside the window where you can still act on it, it doesn't ship in the real-time path. It can live somewhere slower. It cannot live on the hot path.

Until one day: what the cockpit should actually feel like

Before writing a line of inference code, I drew the moment I wanted. Not a dashboard , a moment. You're approaching a corner too hot. A real coach does three different things on three different clocks: an instant, reflexive "brake!" that has to be right now; a calmer, strategic "you keep lifting early here, trust the grip" that can come on the next straight; and a heads-up "this next one's a late apex" that should arrive before you get there. Three jobs, three deadlines. No single model call can serve all three.

RACECRAFT · LIVE 12:04.7 RaceBox BLE GPS · IMU Gemma 4 E2B · edge SPEED 128 km/h THR BRK SONOMA RACEWAY · T7 APPROACH YOU P0 · HOT PATH · 5 ms Brake now. ! AJ Coach AJ INTERMEDIATE "You're lifting early into 7 — trust the grip through mid-corner." HOT COLD · Gemini FEEDFORWARD The cockpit, mid-corner design mockup · Racecraft brand palette HOT path — deterministic safety "Brake now" fires from the on-device DECISION_MATRIX in ~5 ms p95 (measured on Pixel 10). No model in the loop. COLD path — Gemini 2.5 Flash Lite "Why, not what" strategy, 2–5 s, off the critical path. FEEDFORWARD — geofenced Velocity-scaled, fires ~150 m before a known corner. DriverModel — adapts the voice 10 s rolling window, 5 s hysteresis → BEGINNER / INTERMEDIATE / ADVANCED changes language & cadence. On-device — Gemma 4 E2B (edge) LiteRT-LM enrichment runs locally; the loop never blocks on it.
The moment I was designing for: three kinds of advice, three different clocks, one calm screen.

That picture forced the first real architectural decision, and it's the one the rest of this series unpacks: the reflexive "brake!" cannot be allowed to wait behind anything , not a network call, not a model, not even a polite sentence-builder. It has to come from code so simple and so fast that I can read it and tell you exactly why it fired. Everything that wants to be smart can take its time, as long as it never gets in the way of the thing that has to be instant.

Racecraft setup screen on a Pixel 10
The first screen is a pre-drive checklist, not a marketing page: mode, telemetry source, track, coach voice, and whether audio cues fire. A field tool should never hide uncertainty.

What this series covers

This is the first of five posts. I'm going to be honest about what works, what doesn't yet, and the specific walls I hit , because the walls are the interesting part.

  1. The 800-millisecond problem , trust as a latency problem (you're here).
  2. Teaching the coach to read the driver , the DriverModel and why we coach the human, not the car.
  3. Splitting the brain to beat the clock , the HOT / COLD / FEEDFORWARD engine and the 5 ms safety path.
  4. Putting Gemma in the cockpit , on-device Gemma 4 E2B, LiteRT-LM, the NPU, and the model-loading saga that ate a week.
  5. Earning trust at speed , determinism, on-device validation, and what shipped vs. what's still scaffolding.

Strip away the racing specifics and the lesson is blunt: in a high-stakes, real-time environment, latency isn't a performance metric, it's a trust metric. Be even a half-second late and you haven't just given bad advice; you've taught the user to stop listening. And mute, as I learned at 130 km/h, is permanent.

So the rest of this series is one long answer to a single question: how do you make the advice that matters arrive in time, every time? The fix starts by re-engineering the one path that can never afford to wait, the hot path — so a "brake!" is decided and spoken in milliseconds, with no model, no network, and no excuses. That's where this series is headed.

So the rest of this series is one long answer to a single question: how do you make the advice that matters arrive in time, every time? The fix starts by re-engineering the one path that can never afford to wait, the hot path, so a "brake!" is decided and spoken in milliseconds, with no model, no network, and no excuses. That's where we go next.

Racecraft is an on-device, real-time driving coach built around Gemma 4. Code: github.com/rabimba/speedracer-AI.


Comments

Popular posts from this blog

Deep Dive into the Google Agent Development Kit (ADK): Features and Code Examples

In our previous overview, we introduced the Google Agent Development Kit (ADK) as a powerful Python framework for building sophisticated AI agents. Now, let's dive deeper into some of the specific features that make ADK a compelling choice for developers looking to create agents that can reason, plan, use tools, and interact effectively with the world. 1. The Core: Configuring the `LlmAgent` The heart of most ADK applications is the LlmAgent (aliased as Agent for convenience). This agent uses a Large Language Model (LLM) for its core reasoning and decision-making. Configuring it effectively is key: name (str): A unique identifier for your agent within the application. model (str | BaseLlm): Specify the LLM to use. You can provide a model name string (like 'gemini-1.5-flash') or an instance of a model class (e.g., Gemini() ). ADK resolves string names using its registry. instruction (str | Callable): This is crucial for guiding the agent's be...

Build Smarter AI Agents Faster: Introducing the Google Agent Development Kit (ADK)

The world is buzzing about AI agents – intelligent entities that can understand goals, make plans, use tools, and interact with the world to get things done. But building truly capable agents that go beyond simple chatbots can be complex. You need to handle Large Language Model (LLM) interactions, manage conversation state, give the agent access to tools (like APIs or code execution), orchestrate complex workflows, and much more. Introducing the Google Agent Development Kit (ADK) , a comprehensive Python framework from Google designed to significantly simplify the process of building, testing, deploying, and managing sophisticated AI agents. Whether you're building a customer service assistant that interacts with your internal APIs, a research agent that can browse the web and summarize findings, or a home automation hub, ADK provides the building blocks you need. Core Concepts: What Makes ADK Tick? ADK is built around several key concepts that make agent development more s...

Curious case of Cisco AnyConnect and WSL2

One thing Covid has taught me is the importance of VPN. Also one other thing COVID has taught me while I work from home  is that your Windows Machine can be brilliant  as long as you have WSL2 configured in it. So imagine my dismay when I realized I cannot access my University resources while being inside the University provided VPN client. Both of the institutions I have affiliation with, requires me to use VPN software which messes up WSL2 configuration (which of course I realized at 1:30 AM). Don't get me wrong, I have faced this multiple times last two years (when I was stuck in India), and mostly I have been lazy and bypassed the actual problem by side-stepping with my not-so-noble  alternatives, which mostly include one of the following: Connect to a physical machine exposed to the internet and do an ssh tunnel from there (not so reliable since this is my actual box sitting at lab desk, also not secure enough) Create a poor man's socks proxy in that same box to have...