Racecraft · Prologue , The Origin Story
It Started With a Wine List and a Question About Racing
How a happy-hour conversation in the Bay Area turned into a trustable AI race coach , and then into a second version that runs entirely on a phone, on the NPU. This is the prologue to a five-part series.
Two years ago(1st November, 2024) I was in the Bay Area for a GDE Summit. If you've never been: it's a couple of days of talks among Google Developer Experts, the kind of people who get unreasonably excited about a new on-device runtime, and then , mercifully , a happy hour where everyone stops performing and just eats. We ended up at a restaurant(Puesto Santa Clara), a long table of GDEs, and I was doing the most important engineering of the evening: trying to decide which wine to order.
Across the table was Ajeet Mirwani. I don't even remember how the wine talk turned into racing talk , these things drift , but the moment the word "racing" came up, Ajeet lit up. He leaned in and asked the table: do any of you race? Have you ever raced?
I said no. Disappointingly, embarrassingly no. But , and I said this a little too quickly , I was fascinated by it. Ajeet grinned and told us that he and a couple of other Googlers actually have their own race cars, and that they take them out to Thunderhill and Sonoma for track days. Real cars, real corners, real tenths of a second. I remember thinking that was the coolest sentence anyone had said to me all summit.
| That table, and that conversation |
And then, like most great restaurant ideas, we completely forgot about it.
Until one day: "can you build an AI that makes me faster?"
Fast-forward to last year. We were running a Gemini sprint with the Google ML team, and Ajeet came back around to a few of us with a question that was equal parts engineering challenge and racing-driver bravado: could we build an AI coach that actually makes him go faster on track?
That question turned out to be much deeper than it sounds. A coach that's right is easy. A coach you'd actually trust in a car at speed is hard, because the moment it tells you something a half-second too late, you stop listening , forever. What we built to answer it became the first public blueprint for trustable AI , "Beyond the Chatbot: A Blueprint for Trustable AI," on the Google Developers Blog, co-written by Ajeet.
If you go read it, you'll see my work in there. You will not see me in the team photo , because the day it was taken I was on a flight back to Houston to accept a distinguished PhD thesis award. I've made peace with it. It is, I maintain, the only acceptable excuse for missing the group picture.
That first version proved the thesis: a "split-brain" design , an instant, deterministic safety reflex separated from a slower, smarter strategic brain , could deliver coaching that earns trust instead of getting muted. But it leaned on the cloud. And that left a question I couldn't stop poking at.
If feedback 800 ms late is worse than silence , why is the smartest part of the coach sitting in a data center, a round-trip away from the car?
Until finally: the whole coach, on the phone
This series is about the second version , the one we call Racecraft , and it answers that question by deleting the data center from the critical path. The reasoning model now runs on the phone itself: Gemma 4 E2B, on a Pixel, through LiteRT-LM. No cell signal required at a track in the hills. And when we finally benchmarked it across the phone's silicon, the result was the one I'd been hoping for since that restaurant: the NPU , the dedicated AI chip , was the fastest lane of all, first word out in 424 ms, while the safety reflex still fires in 5 ms with no model in the loop at all. (Google's own developer blog has been mapping this same on-device LiteRT + NPU territory , it's a real moment for edge AI.)
So that's the arc: a non-racer, a racing-obsessed Googler, a glass of wine, a forgotten idea, a cloud prototype that became a published blueprint, and finally a coach small and fast enough to live in your ear at 130 km/h. The five posts that follow are how we actually built that last part , the architecture, the model, the latency fights, and the honest dead-ends.
The series
- The 800-Millisecond Problem , why I muted every AI coach I ever tried, and the one rule that fixes it.
- Teaching the Coach to Read the Driver , coaching the human, not the car: the DriverModel and skill-adaptive feedback.
- Splitting the Brain to Beat the Clock , how a "brake!" lands in 5 ms while a model thinks for seconds, in the same app.
- Putting Gemma in the Cockpit , on-device Gemma 4 E2B, the week lost to one tensor, and the NPU winning.
- Earning Trust at Speed , determinism, on-device validation, and what generalizes far beyond racing.
Start with Part 1. It begins, appropriately, with me hitting the mute button.
1. The 800 ms ProblemRacecraft · an on-device, real-time driving coach built around Gemma 4. Code & full writeup: github.com/rabimba/speedracer-AI. With thanks to Ajeet Mirwani, who started all of this with one question about racing.
Comments
Post a Comment