Skip to main content

Teaching the Coach to Read the Driver

Racecraft · Part 2 of 5 · ← Prologue

Teaching the Coach to Read the Driver

The best instructors don't coach the car. They coach you , and they figure out who you are in about two laps. Here's how we taught software to do the same.

In Part 1 I argued that trust is the only metric that matters, and that it's mostly a latency problem. That's true , but there's a second half I glossed over. The same sentence, delivered at the exact same millisecond, can build trust or destroy it depending on who's listening.

Tell a nervous first-timer "brake spike detected, modulate your input" and you've just handed them a stack trace mid-corner. Tell a fast amateur "squeeze the brakes, don't stab" for the tenth time and they'll mute you out of sheer irritation. The words have to match the driver. So before Racecraft can say anything, it has to answer a question a human coach answers instinctively: how good is this person, right now, this lap?

Every day: the model that watches the hands, not the lap time

My first instinct was the obvious one , classify skill by lap time. It's garbage. Lap time conflates the car, the track, the conditions, and the driver into one number, and it tells you nothing until the lap is over. I needed something that read the driver continuously, from inputs, in a way that would be stable enough to act on.

The piece that does this is the DriverModel. It runs on a 10-second rolling window and leans on two signals that turn out to separate skill levels remarkably well:

  • Input smoothness , essentially the jerk of brake, throttle and steering. Beginners are jagged; they stab and saw. Smooth, deliberate inputs are the single clearest fingerprint of experience.
  • Coasting ratio , the fraction of time with neither pedal applied. Coasting is hesitation made measurable. Beginners coast because they're unsure which pedal they want; fast drivers are almost always on something.

From those it classifies the driver as BEGINNER, INTERMEDIATE, or ADVANCED , and then that one label reaches into nearly every other part of the system.

This wasn't an abstract spec. Our test fleet was the skill ladder: I was the beginner, learning the line in the Toyota GR86, and Ajeet was the intermediate in the other car. The same coach had to serve both of us without re-coding , talk me through the basics without burying me, and give Ajeet granular, technical feedback without insulting him. One app, two very different drivers; the DriverModel is what let it tell us apart.

The coaching paradigm The DriverModel reads the driver, not just the car , and that read modulates everything the coach says. DriverModel10 s rolling window5 s hysteresis (no flicker)SIGNALS• input smoothness (jerk)• coasting ratio BEGINNERINTERMEDIATEADVANCED What changes LANGUAGE"Squeeze the brakes""Brake spike , modulate" CADENCE3000 ms cooldown + apex blackout650 ms cooldown, no blackout PROGRESSIONtrail-braking suppressed earlyfull action set immediately LLM PROMPT"simple, feel-based sentences""reference the telemetry numbers" amber = beginner framing · green = advanced framing Same moment, four voices (persona layer) AJ: "Brake. Now."Tony: "Brake! Trust the tires!"Rachel: "Trail brake. Keep front load."
One label , BEGINNER/INTERMEDIATE/ADVANCED , reshapes language, cadence, what's unlocked, and how the on-device model is prompted.

Until one day: the model that couldn't make up its mind

The first version flickered. A driver hovering on the line between intermediate and advanced would get reclassified every couple of seconds, and because the classification changes the voice, the coach developed a kind of multiple-personality disorder , feel-based and reassuring one second, clipped and technical the next. It was unsettling in exactly the way that erodes trust.

The fix is boring and essential: 5 seconds of hysteresis. The model has to be confident in a new level for a sustained window before it's allowed to switch. A human coach does the same thing , they form an impression and update it slowly, not on every corner. Stability beat responsiveness, and that was the right call.

Because of that: four levers, not one

Once the label is stable, it pulls four levers at once , and this is the part I'm proudest of, because it mirrors what good instructors actually do:

  • Language. Beginners get feel ("squeeze, don't stab"). Advanced drivers get data ("brake spike detected , modulate").
  • Cadence. Beginners get a 3-second cooldown between cues and a full blackout during high-load moments like the apex , silence is a feature. Advanced drivers get a 650 ms cooldown and no blackout, because they can take granular feedback without it costing them the corner.
  • Progression. Early in a beginner's session, advanced techniques like trail-braking are suppressed entirely. You don't teach someone to trail-brake on their third-ever lap. Advanced drivers get the full action set immediately.
  • The model prompt. Even the cloud model's instructions change , "give simple, feel-based sentences" vs. "reference the specific telemetry numbers." Same model, different driver, different mouth.

On top of all that sits a persona layer , AJ is terse, Tony is fiery, Rachel is technical , so the same decision comes out in the voice you picked. It's the difference between a tool and a co-driver you want in the car.

The wall I'm still climbing

Here's the honest bit. There's one place where the persona layer and the trust thesis fight each other: the most urgent safety alert. Right now even a P0 "brake!" is still styled by the selected persona , it gets sharper, but Tony is still Tony. A human instructor drops the act entirely when it matters; their voice goes flat and absolute. Racecraft doesn't fully do that yet. A global, persona-overriding "safety voice" for the most critical events is the next thing on this list, and it's a good example of how a trust-first design keeps generating work even after the feature "works."

So we can read the driver and shape the message. But none of that matters if the message arrives late. Next post: the architecture that lets a "brake!" jump the entire queue and land in 5 milliseconds , while a cloud model takes its sweet time in another lane.


Racecraft · on-device real-time driving coach built around Gemma 4. Code: github.com/rabimba/speedracer-AI.

Comments

Popular posts from this blog

Deep Dive into the Google Agent Development Kit (ADK): Features and Code Examples

In our previous overview, we introduced the Google Agent Development Kit (ADK) as a powerful Python framework for building sophisticated AI agents. Now, let's dive deeper into some of the specific features that make ADK a compelling choice for developers looking to create agents that can reason, plan, use tools, and interact effectively with the world. 1. The Core: Configuring the `LlmAgent` The heart of most ADK applications is the LlmAgent (aliased as Agent for convenience). This agent uses a Large Language Model (LLM) for its core reasoning and decision-making. Configuring it effectively is key: name (str): A unique identifier for your agent within the application. model (str | BaseLlm): Specify the LLM to use. You can provide a model name string (like 'gemini-1.5-flash') or an instance of a model class (e.g., Gemini() ). ADK resolves string names using its registry. instruction (str | Callable): This is crucial for guiding the agent's be...

Build Smarter AI Agents Faster: Introducing the Google Agent Development Kit (ADK)

The world is buzzing about AI agents – intelligent entities that can understand goals, make plans, use tools, and interact with the world to get things done. But building truly capable agents that go beyond simple chatbots can be complex. You need to handle Large Language Model (LLM) interactions, manage conversation state, give the agent access to tools (like APIs or code execution), orchestrate complex workflows, and much more. Introducing the Google Agent Development Kit (ADK) , a comprehensive Python framework from Google designed to significantly simplify the process of building, testing, deploying, and managing sophisticated AI agents. Whether you're building a customer service assistant that interacts with your internal APIs, a research agent that can browse the web and summarize findings, or a home automation hub, ADK provides the building blocks you need. Core Concepts: What Makes ADK Tick? ADK is built around several key concepts that make agent development more s...

Curious case of Cisco AnyConnect and WSL2

One thing Covid has taught me is the importance of VPN. Also one other thing COVID has taught me while I work from home  is that your Windows Machine can be brilliant  as long as you have WSL2 configured in it. So imagine my dismay when I realized I cannot access my University resources while being inside the University provided VPN client. Both of the institutions I have affiliation with, requires me to use VPN software which messes up WSL2 configuration (which of course I realized at 1:30 AM). Don't get me wrong, I have faced this multiple times last two years (when I was stuck in India), and mostly I have been lazy and bypassed the actual problem by side-stepping with my not-so-noble  alternatives, which mostly include one of the following: Connect to a physical machine exposed to the internet and do an ssh tunnel from there (not so reliable since this is my actual box sitting at lab desk, also not secure enough) Create a poor man's socks proxy in that same box to have...