Skip to main content

Posts

Teaching the Coach to Read the Driver

Racecraft · Part 2 of 5 · ← Prologue Teaching the Coach to Read the Driver The best instructors don't coach the car. They coach you , and they figure out who you are in about two laps. Here's how we taught software to do the same. In Part 1 I argued that trust is the only metric that matters, and that it's mostly a latency problem. That's true , but there's a second half I glossed over. The same sentence, delivered at the exact same millisecond, can build trust or destroy it depending on who's listening. Tell a nervous first-timer "brake spike detected, modulate your input" and you've just handed them a stack trace mid-corner. Tell a fast amateur "squeeze the brakes, don't stab" for the tenth time and they'll mute you out of sheer irritation. The words have to match the driver. So before Racecraft can say anything, it has to answer a question a human coach answers instinctively: how good is this person, rig...
Recent posts

The 800-Millisecond Problem

Racecraft · Part 1 of 5 · ← Prologue The 800-Millisecond Problem Why I stopped trusting every " AI race coach " I'd ever tried , and what it would take to build one I'd actually listen to at 130 km/h. The first time an app yelled "Brake!" at me on a track day, I'd already braked — not by a hair, but by a full corner. I was unwinding the wheel and feeding in throttle on the exit of Sonoma's Turn 7, one of the fastest corners on the circuit, when the phone told me — with great confidence — to brake. It wasn't merely lagging by a beat; for that specific high-speed corner the cue was flat wrong, because braking mid-exit at that speed is how you put the car into a spin. And I couldn't even mute it. I was a beginner on a mandatory-instructor day, both hands on the wheel, eyes up. It was the instructor in my passenger seat — there to call flags and lines, allowed to say anything but never to touch a single control — who reached...

Racecraft (Project Koru) · Prologue — The Origin Story

Racecraft · Prologue , The Origin Story It Started With a Wine List and a Question About Racing How a happy-hour conversation in the Bay Area turned into a trustable AI race coach , and then into a second version that runs entirely on a phone, on the NPU. This is the prologue to a five-part series. Two years ago(1st November, 2024) I was in the Bay Area for a GDE Summit. If you've never been: it's a couple of days of talks among Google Developer Experts, the kind of people who get unreasonably excited about a new on-device runtime, and then , mercifully , a happy hour where everyone stops performing and just eats. We ended up at a restaurant(Puesto Santa Clara), a long table of GDEs, and I was doing the most important engineering of the evening: trying to decide which wine to order. Across the table was Ajeet Mirwani . I don't even remember how the wine talk turned into racing talk , these things drift , but the moment the word "racing" ...

The Blind Spot Horizon: Why Your AI Benchmarks Are Lying to You

"If you measure the wrong thing, you optimize for the wrong thing. And when dealing with frontier, multimodal models, the wrong thing might just break your entire system without throwing a single error." We are remarkably comfortable evaluating the AI models we’ve already built. We have our standard suites, our automated leaderboards, and our hard-coded unit tests. We feel in control. But let me drop a truth bomb that’s been brewing across frontier labs: we are profoundly, staggeringly bad at evaluating the models we are about to build . Most benchmarks, safety evaluations, and red-teaming protocols operate on a comforting but incredibly lazy assumption. They treat the next iteration of an LLM or a large multimodal model like a linear upgrade,like turning a dial from 8 to 10. But if you’ve spent any time hacking away at deep learning architectures or building agentic frameworks, you know that complex networks don’t scale smoothly. They undergo massive, silent phase transi...

My Friend's MRI Didn't Come with a Manual, So I Built One with AI

Gemma 4 Good Hackathon · Impact Track · Health & Sciences It started with two envelopes. One contained a single sheet of paper, a radiologist's report for my friend. It was a wall of text that might as well have been written in another language. Words like " parenchymal volume ," " hyperintensities ," and " susceptibility artifact " stared back at us, creating more anxiety than they resolved. The other was a flimsy paper sleeve containing a CD-ROM. This, we were told, held the actual images from her MRI scan. The ground truth. And we couldn't even look at it. Our laptops, like most these days, don't have disc drives. For a moment, this crucial, deeply personal piece of her health information was a coaster. I felt that familiar, hot-wired frustration every engineer knows: the feeling of being locked out by a dumb problem. The powerlessness was infuriating. So, I did what any slightly obsessive software engineer would do...

The Hidden Performance Trap in Causal Ring Attention

When I set out to implement Ring Attention for long-context models , I hit a wall that none of the papers prepared me for. My performance was capping out at half of what it should be, no matter how many accelerators I threw at the problem. It turns out, there's a hidden performance trap in the standard recipe for causal attention . Here’s the story of that bug, how to prove it exists without burning a single TPU hour, and how a simple trick called " zigzag sharding " fixed it completely. This post is a walkthrough of ring-flash-jax , a small JAX project that explores this problem and implements the fix. We'll add the four things you actually need before Ring Attention is usable for training a real-world causal language model . TL;DR ring-flash-jax is a JAX implementation of ring attention with four key additions to the standard pattern: Causal masking — required for any decoder language model. Zigzag (striped) sharding — fixes the critical load...