Racecraft · Part 2 of 5 · ← Prologue
Teaching the Coach to Read the Driver
The best instructors don't coach the car. They coach you , and they figure out who you are in about two laps. Here's how we taught software to do the same.
In Part 1 I argued that trust is the only metric that matters, and that it's mostly a latency problem. That's true , but there's a second half I glossed over. The same sentence, delivered at the exact same millisecond, can build trust or destroy it depending on who's listening.
Tell a nervous first-timer "brake spike detected, modulate your input" and you've just handed them a stack trace mid-corner. Tell a fast amateur "squeeze the brakes, don't stab" for the tenth time and they'll mute you out of sheer irritation. The words have to match the driver. So before Racecraft can say anything, it has to answer a question a human coach answers instinctively: how good is this person, right now, this lap?
Every day: the model that watches the hands, not the lap time
My first instinct was the obvious one , classify skill by lap time. It's garbage. Lap time conflates the car, the track, the conditions, and the driver into one number, and it tells you nothing until the lap is over. I needed something that read the driver continuously, from inputs, in a way that would be stable enough to act on.
The piece that does this is the DriverModel. It runs on a 10-second rolling window and leans on two signals that turn out to separate skill levels remarkably well:
- Input smoothness , essentially the jerk of brake, throttle and steering. Beginners are jagged; they stab and saw. Smooth, deliberate inputs are the single clearest fingerprint of experience.
- Coasting ratio , the fraction of time with neither pedal applied. Coasting is hesitation made measurable. Beginners coast because they're unsure which pedal they want; fast drivers are almost always on something.
From those it classifies the driver as BEGINNER, INTERMEDIATE, or ADVANCED , and then that one label reaches into nearly every other part of the system.
This wasn't an abstract spec. Our test fleet was the skill ladder: I was the beginner, learning the line in the Toyota GR86, and Ajeet was the intermediate in the other car. The same coach had to serve both of us without re-coding , talk me through the basics without burying me, and give Ajeet granular, technical feedback without insulting him. One app, two very different drivers; the DriverModel is what let it tell us apart.
Until one day: the model that couldn't make up its mind
The first version flickered. A driver hovering on the line between intermediate and advanced would get reclassified every couple of seconds, and because the classification changes the voice, the coach developed a kind of multiple-personality disorder , feel-based and reassuring one second, clipped and technical the next. It was unsettling in exactly the way that erodes trust.
The fix is boring and essential: 5 seconds of hysteresis. The model has to be confident in a new level for a sustained window before it's allowed to switch. A human coach does the same thing , they form an impression and update it slowly, not on every corner. Stability beat responsiveness, and that was the right call.
Because of that: four levers, not one
Once the label is stable, it pulls four levers at once , and this is the part I'm proudest of, because it mirrors what good instructors actually do:
- Language. Beginners get feel ("squeeze, don't stab"). Advanced drivers get data ("brake spike detected , modulate").
- Cadence. Beginners get a 3-second cooldown between cues and a full blackout during high-load moments like the apex , silence is a feature. Advanced drivers get a 650 ms cooldown and no blackout, because they can take granular feedback without it costing them the corner.
- Progression. Early in a beginner's session, advanced techniques like trail-braking are suppressed entirely. You don't teach someone to trail-brake on their third-ever lap. Advanced drivers get the full action set immediately.
- The model prompt. Even the cloud model's instructions change , "give simple, feel-based sentences" vs. "reference the specific telemetry numbers." Same model, different driver, different mouth.
On top of all that sits a persona layer , AJ is terse, Tony is fiery, Rachel is technical , so the same decision comes out in the voice you picked. It's the difference between a tool and a co-driver you want in the car.
The wall I'm still climbing
Here's the honest bit. There's one place where the persona layer and the trust thesis fight each other: the most urgent safety alert. Right now even a P0 "brake!" is still styled by the selected persona , it gets sharper, but Tony is still Tony. A human instructor drops the act entirely when it matters; their voice goes flat and absolute. Racecraft doesn't fully do that yet. A global, persona-overriding "safety voice" for the most critical events is the next thing on this list, and it's a good example of how a trust-first design keeps generating work even after the feature "works."
So we can read the driver and shape the message. But none of that matters if the message arrives late. Next post: the architecture that lets a "brake!" jump the entire queue and land in 5 milliseconds , while a cloud model takes its sweet time in another lane.
Racecraft · on-device real-time driving coach built around Gemma 4. Code: github.com/rabimba/speedracer-AI.
Comments
Post a Comment