It Started With a Wine List and a Question About Racing

How a happy-hour conversation in the Bay Area turned into a trustable AI race coach , and then into a second version that runs entirely on a phone, on the NPU. This is the prologue to a five-part series.

Two years ago(1st November, 2024) I was in the Bay Area for a GDE Summit. If you've never been: it's a couple of days of talks among Google Developer Experts, the kind of people who get unreasonably excited about a new on-device runtime, and then , mercifully , a happy hour where everyone stops performing and just eats. We ended up at a restaurant(Puesto Santa Clara), a long table of GDEs, and I was doing the most important engineering of the evening: trying to decide which wine to order.

Across the table was Ajeet Mirwani. I don't even remember how the wine talk turned into racing talk , these things drift , but the moment the word "racing" came up, Ajeet lit up. He leaned in and asked the table: do any of you race? Have you ever raced?

I said no. Disappointingly, embarrassingly no. But , and I said this a little too quickly , I was fascinated by it. Ajeet grinned and told us that he and a couple of other Googlers actually have their own race cars, and that they take them out to Thunderhill and Sonoma for track days. Real cars, real corners, real tenths of a second. I remember thinking that was the coolest sentence anyone had said to me all summit.

That table, and that conversation

And then, like most great restaurant ideas, we completely forgot about it.

Until one day: "can you build an AI that makes me faster?"

Fast-forward to last year. We were running a Gemini sprint with the Google ML team, and Ajeet came back around to a few of us with a question that was equal parts engineering challenge and racing-driver bravado: could we build an AI coach that actually makes him go faster on track?

That question turned out to be much deeper than it sounds. A coach that's right is easy. A coach you'd actually trust in a car at speed is hard, because the moment it tells you something a half-second too late, you stop listening , forever. What we built to answer it became the first public blueprint for trustable AI , "Beyond the Chatbot: A Blueprint for Trustable AI," on the Google Developers Blog, co-written by Ajeet.

If you go read it, you'll see my work in there. You will not see me in the team photo , because the day it was taken I was on a flight back to Houston to accept a distinguished PhD thesis award. I've made peace with it. It is, I maintain, the only acceptable excuse for missing the group picture.

That first version proved the thesis: a "split-brain" design , an instant, deterministic safety reflex separated from a slower, smarter strategic brain , could deliver coaching that earns trust instead of getting muted. But it leaned on the cloud. And that left a question I couldn't stop poking at.

If feedback 800 ms late is worse than silence , why is the smartest part of the coach sitting in a data center, a round-trip away from the car?

Until finally: the whole coach, on the phone

This series is about the second version , the one we call Racecraft , and it answers that question by deleting the data center from the critical path. The reasoning model now runs on the phone itself: Gemma 4 E2B, on a Pixel, through LiteRT-LM. No cell signal required at a track in the hills. And when we finally benchmarked it across the phone's silicon, the result was the one I'd been hoping for since that restaurant: the NPU , the dedicated AI chip , was the fastest lane of all, first word out in 424 ms, while the safety reflex still fires in 5 ms with no model in the loop at all. (Google's own developer blog has been mapping this same on-device LiteRT + NPU territory , it's a real moment for edge AI.)

So that's the arc: a non-racer, a racing-obsessed Googler, a glass of wine, a forgotten idea, a cloud prototype that became a published blueprint, and finally a coach small and fast enough to live in your ear at 130 km/h. The five posts that follow are how we actually built that last part , the architecture, the model, the latency fights, and the honest dead-ends.

The series

The 800-Millisecond Problem , why I muted every AI coach I ever tried, and the one rule that fixes it.
Teaching the Coach to Read the Driver , coaching the human, not the car: the DriverModel and skill-adaptive feedback.
Splitting the Brain to Beat the Clock , how a "brake!" lands in 5 ms while a model thinks for seconds, in the same app.
Putting Gemma in the Cockpit , on-device Gemma 4 E2B, the week lost to one tensor, and the NPU winning.
Earning Trust at Speed , determinism, on-device validation, and what generalizes far beyond racing.

Start with Part 1. It begins, appropriately, with me hitting the mute button.

1. The 800 ms Problem

Racecraft · an on-device, real-time driving coach built around Gemma 4. Code & full writeup: github.com/rabimba/speedracer-AI. With thanks to Ajeet Mirwani, who started all of this with one question about racing.

RK's Rambling

Search This Blog

Racecraft (Project Koru) · Prologue — The Origin Story

It Started With a Wine List and a Question About Racing

Until one day: "can you build an AI that makes me faster?"

Until finally: the whole coach, on the phone

The series

Labels

Comments

Post a Comment

Popular posts from this blog

Deep Dive into the Google Agent Development Kit (ADK): Features and Code Examples

Build Smarter AI Agents Faster: Introducing the Google Agent Development Kit (ADK)

Curious case of Cisco AnyConnect and WSL2