Skip to main content

Racecraft (Project Koru) · Prologue — The Origin Story

Racecraft · Prologue , The Origin Story

It Started With a Wine List and a Question About Racing

How a happy-hour conversation in the Bay Area turned into a trustable AI race coach , and then into a second version that runs entirely on a phone, on the NPU. This is the prologue to a five-part series.

Two years ago(1st November, 2024) I was in the Bay Area for a GDE Summit. If you've never been: it's a couple of days of talks among Google Developer Experts, the kind of people who get unreasonably excited about a new on-device runtime, and then , mercifully , a happy hour where everyone stops performing and just eats. We ended up at a restaurant(Puesto Santa Clara), a long table of GDEs, and I was doing the most important engineering of the evening: trying to decide which wine to order.

Across the table was Ajeet Mirwani. I don't even remember how the wine talk turned into racing talk , these things drift , but the moment the word "racing" came up, Ajeet lit up. He leaned in and asked the table: do any of you race? Have you ever raced?

I said no. Disappointingly, embarrassingly no. But , and I said this a little too quickly , I was fascinated by it. Ajeet grinned and told us that he and a couple of other Googlers actually have their own race cars, and that they take them out to Thunderhill and Sonoma for track days. Real cars, real corners, real tenths of a second. I remember thinking that was the coolest sentence anyone had said to me all summit.

That table, and that conversation

And then, like most great restaurant ideas, we completely forgot about it.

Until one day: "can you build an AI that makes me faster?"

Fast-forward to last year. We were running a Gemini sprint with the Google ML team, and Ajeet came back around to a few of us with a question that was equal parts engineering challenge and racing-driver bravado: could we build an AI coach that actually makes him go faster on track?

That question turned out to be much deeper than it sounds. A coach that's right is easy. A coach you'd actually trust in a car at speed is hard, because the moment it tells you something a half-second too late, you stop listening , forever. What we built to answer it became the first public blueprint for trustable AI , "Beyond the Chatbot: A Blueprint for Trustable AI," on the Google Developers Blog, co-written by Ajeet.

If you go read it, you'll see my work in there. You will not see me in the team photo , because the day it was taken I was on a flight back to Houston to accept a distinguished PhD thesis award. I've made peace with it. It is, I maintain, the only acceptable excuse for missing the group picture.

That first version proved the thesis: a "split-brain" design , an instant, deterministic safety reflex separated from a slower, smarter strategic brain , could deliver coaching that earns trust instead of getting muted. But it leaned on the cloud. And that left a question I couldn't stop poking at.

If feedback 800 ms late is worse than silence , why is the smartest part of the coach sitting in a data center, a round-trip away from the car?

Until finally: the whole coach, on the phone

This series is about the second version , the one we call Racecraft , and it answers that question by deleting the data center from the critical path. The reasoning model now runs on the phone itself: Gemma 4 E2B, on a Pixel, through LiteRT-LM. No cell signal required at a track in the hills. And when we finally benchmarked it across the phone's silicon, the result was the one I'd been hoping for since that restaurant: the NPU , the dedicated AI chip , was the fastest lane of all, first word out in 424 ms, while the safety reflex still fires in 5 ms with no model in the loop at all. (Google's own developer blog has been mapping this same on-device LiteRT + NPU territory , it's a real moment for edge AI.)

So that's the arc: a non-racer, a racing-obsessed Googler, a glass of wine, a forgotten idea, a cloud prototype that became a published blueprint, and finally a coach small and fast enough to live in your ear at 130 km/h. The five posts that follow are how we actually built that last part , the architecture, the model, the latency fights, and the honest dead-ends.

The series

Racecraft , a five-part build log 1 2 3 4 5 The 800 msproblem Reading thedriver Split-Brain& latency Gemma onthe NPU Earningtrust trust = latency coaching paradigm 5 ms safety 424 ms TTFT 16/16 on device "Feedback 800 ms late is worse than silence."
  1. The 800-Millisecond Problem , why I muted every AI coach I ever tried, and the one rule that fixes it.
  2. Teaching the Coach to Read the Driver , coaching the human, not the car: the DriverModel and skill-adaptive feedback.
  3. Splitting the Brain to Beat the Clock , how a "brake!" lands in 5 ms while a model thinks for seconds, in the same app.
  4. Putting Gemma in the Cockpit , on-device Gemma 4 E2B, the week lost to one tensor, and the NPU winning.
  5. Earning Trust at Speed , determinism, on-device validation, and what generalizes far beyond racing.

Start with Part 1. It begins, appropriately, with me hitting the mute button.

1. The 800 ms Problem

Racecraft · an on-device, real-time driving coach built around Gemma 4. Code & full writeup: github.com/rabimba/speedracer-AI. With thanks to Ajeet Mirwani, who started all of this with one question about racing.



Comments

Popular posts from this blog

Deep Dive into the Google Agent Development Kit (ADK): Features and Code Examples

In our previous overview, we introduced the Google Agent Development Kit (ADK) as a powerful Python framework for building sophisticated AI agents. Now, let's dive deeper into some of the specific features that make ADK a compelling choice for developers looking to create agents that can reason, plan, use tools, and interact effectively with the world. 1. The Core: Configuring the `LlmAgent` The heart of most ADK applications is the LlmAgent (aliased as Agent for convenience). This agent uses a Large Language Model (LLM) for its core reasoning and decision-making. Configuring it effectively is key: name (str): A unique identifier for your agent within the application. model (str | BaseLlm): Specify the LLM to use. You can provide a model name string (like 'gemini-1.5-flash') or an instance of a model class (e.g., Gemini() ). ADK resolves string names using its registry. instruction (str | Callable): This is crucial for guiding the agent's be...

Build Smarter AI Agents Faster: Introducing the Google Agent Development Kit (ADK)

The world is buzzing about AI agents – intelligent entities that can understand goals, make plans, use tools, and interact with the world to get things done. But building truly capable agents that go beyond simple chatbots can be complex. You need to handle Large Language Model (LLM) interactions, manage conversation state, give the agent access to tools (like APIs or code execution), orchestrate complex workflows, and much more. Introducing the Google Agent Development Kit (ADK) , a comprehensive Python framework from Google designed to significantly simplify the process of building, testing, deploying, and managing sophisticated AI agents. Whether you're building a customer service assistant that interacts with your internal APIs, a research agent that can browse the web and summarize findings, or a home automation hub, ADK provides the building blocks you need. Core Concepts: What Makes ADK Tick? ADK is built around several key concepts that make agent development more s...

Curious case of Cisco AnyConnect and WSL2

One thing Covid has taught me is the importance of VPN. Also one other thing COVID has taught me while I work from home  is that your Windows Machine can be brilliant  as long as you have WSL2 configured in it. So imagine my dismay when I realized I cannot access my University resources while being inside the University provided VPN client. Both of the institutions I have affiliation with, requires me to use VPN software which messes up WSL2 configuration (which of course I realized at 1:30 AM). Don't get me wrong, I have faced this multiple times last two years (when I was stuck in India), and mostly I have been lazy and bypassed the actual problem by side-stepping with my not-so-noble  alternatives, which mostly include one of the following: Connect to a physical machine exposed to the internet and do an ssh tunnel from there (not so reliable since this is my actual box sitting at lab desk, also not secure enough) Create a poor man's socks proxy in that same box to have...