A Split‑Brain Neuro‑Symbolic Training Method for High‑Velocity Autonomous Coaching from Telemetry

Author: Rabimba Karanjai

Scope: Problem statement + data methodology + model training (no deployment discussion)

Abstract

Real‑time coaching in motorsport is a safety‑critical learning problem: a system must map noisy, high‑frequency telemetry to short, actionable guidance that remains physically consistent and avoids hazardous recommendations.

This paper proposes a “Split‑Brain” training formulation that separates (i) a semantic coaching target (what action/critique should be expressed) from (ii) a reflexive interface (how actions are represented as compact, verifiable tokens). The approach trains a Small Language Model (SLM) in the Gemma family^[1] using QLoRA fine‑tuning^[2], and introduces a telemetry tokenizer plus teacher‑student synthesis pipeline to generate instruction‑action pairs at scale.

Core contribution: a reproducible method to convert “golden lap” differential telemetry into structured instruction‑tuning data, with an explicit safety-aware penalty that discourages physically contradictory cues.

1. Introduction

Driver coaching tools often visualize telemetry and highlight deltas, but they rarely produce reliable, context-aware micro‑instructions that can be executed immediately without distracting the driver.

A natural framing is demonstration learning: treat expert laps as demonstrations and learn a model that maps states (telemetry windows) to corrective actions, with the caution that naive supervised imitation can suffer from covariate shift and compounding errors when the learner encounters unseen states. ^[3]^[4]

Telemetry-based studies in sim racing identify performance-relevant signals (e.g., speed and acceleration features), supporting the premise that telemetry contains enough structure to support automated coaching labels and predictive models. ^[5]

2. Problem Statement

2.1 Inputs and outputs

Let \(x_{t-k:t}\) denote a telemetry window of length \(k\) ending at time \(t\), containing signals such as speed, steering, brake, throttle, lateral/longitudinal acceleration, yaw/yaw‑rate, and track position.

The model outputs a structured response \(y_t\) consisting of (i) a discrete action \(a_t\) from a finite vocabulary \(\mathcal{A}\) and (ii) an optional short rationale string intended for human interpretability.

2.2 Learning objective

Training minimizes negative log-likelihood of the reference response with a safety-aware regularizer:

\[ \min_{\theta} \; \mathbb{E}\Big[-\log p_{\theta}(y_t^* \mid x_{t-k:t})\Big] + \lambda \, \mathbb{E}\big[\Omega(y_t, x_{t-k:t})\big] \]

The key design goal is that \(\Omega(\cdot)\) penalizes action tokens that contradict a telemetry-defined safe set, while leaving the model free to choose among safe alternatives.

3. Data Methodology

3.1 “Golden lap” differential representation

Rather than learning from raw telemetry alone, supervision is built on differences between a novice lap \(N\) and an expert reference lap \(P\), so the prompt encodes what went wrong relative to a target trajectory.

Align \(N\) to \(P\) by track position \(s\) (preferred) and compute a differential state:

\[ \Delta(s) = \big[ v_N(s)-v_P(s),\; a^{lat}_N(s)-a^{lat}_P(s),\; a^{long}_N(s)-a^{long}_P(s),\; \psi_N(s)-\psi_P(s),\; \dot{\psi}_N(s)-\dot{\psi}_P(s) \big] \]

3.2 Telemetry tokenizer (feature-to-token mapping)

Because LLMs are trained over discrete token sequences, continuous telemetry is discretized into a compact vocabulary of “physics tokens” with controlled granularity.

Example token schema:

Speed delta: DV=+10mph, DV=-5mph (bin width configurable).
Lateral delta: DLAT=-0.2g (bin width configurable).
Longitudinal delta: DLONG=+0.3g (captures braking/throttle mismatch).
Rotation: DYAW=-3deg or DYAW_RATE=+6deg_s.
Context: SECTOR=3, CORNER=7, PHASE=ENTRY|MID|EXIT.

3.3 Teacher‑student synthesis (scalable labels)

To avoid hand-labeling at scale, a deterministic “teacher” flags divergence events and generates paired labels: a discrete action and a short rationale.

Define an error score:

\[ E_t = \lVert p_P(t) - p_N(t) \rVert_2 + \alpha \, \lvert \psi_P(t)-\psi_N(t)\rvert + \gamma \, \lvert v_P(t)-v_N(t)\rvert \]

For \(E_t > \epsilon\), emit a training pair \((\text{prompt}_t, \text{response}_t)\) where \(\text{prompt}_t\) is tokenized telemetry and \(\text{response}_t\) is a constrained structured output.

Algorithm: Synthesize coaching pairs from differential telemetry
Inputs: expert lap P, novice lap N, thresholds ε, tokenizers φ, label rules R
For each aligned position/time t:
  Δt ← compute_differentials(P(t), N(t))
  Et ← error_score(Δt)
  if Et > ε:
     prompt  ← φ(Δt, context(t))
     action  ← R.classify(Δt, context(t))      # finite action vocabulary
     reason  ← R.render_text(action, Δt)       # short template or learned paraphrase
     output  ← format("<action>{action}</action> <reason>{reason}</reason>")
     store(prompt, output)

4. Model & Training

4.1 Base model

The coaching model is a small, instruction-tuned language model from the Gemma family to support domain adaptation on limited resources. ^[1]

4.2 Parameter‑efficient fine‑tuning (QLoRA)

Fine‑tuning uses QLoRA, which trains low‑rank adapters while keeping a quantized base model frozen, enabling efficient adaptation with reduced memory usage. ^[2]

Report the following for reproducibility:

Checkpoint: exact Gemma variant and whether it is pretrained or instruction-tuned.^[1]
Quantization: 4-bit training quantization configuration used for QLoRA.^[2]
Adapters: \(r\), \(\alpha\), dropout, and the targeted projection modules.
Data: window length \(k\), bin sizes, number of pairs, and the final action vocabulary.

4.3 Safety-aware penalty

Define a telemetry-derived safe action set \(\mathcal{A}_{safe}(x_{t-k:t}) \subseteq \mathcal{A}\) computed by deterministic constraints (e.g., prohibit throttle-up cues during heavy deceleration windows).

A simple instantiation is:

\[ L(\theta) = - \sum_{t} \log p_\theta(y_t^{*}\mid y_{<t}^{*}, x_{t-k:t}) + \lambda \sum_{t} \mathbb{I}\!\left[\hat{a}_t \notin \mathcal{A}_{safe}(x_{t-k:t})\right] \]

4.4 Structured response format

Responses are constrained to a strict schema so that evaluation can be performed with exact matching and rule-based checks.

<action>brake_later</action>
<reason>You are releasing the brake too early on entry; carry trail brake slightly longer.</reason>

5. Evaluation (Training‑Focused)

5.1 Offline metrics

Action accuracy: exact match of <action> on held-out examples.
Safety violation rate: \(\Pr(\hat{a}_t \notin \mathcal{A}_{safe}(x_{t-k:t}))\).
Phase confusion: misclassifications across ENTRY/MID/EXIT contexts.

5.2 Split strategy

In addition to random splits, evaluate by holding out entire laps/sessions (and ideally entire drivers) to test generalization under distribution shift, a known challenge in imitation-style learning setups. ^[3]^[4]

References

Gemma Team. Gemma: Open Models Based on Gemini Research and Technology. arXiv:2403.08295, 2024. https://arxiv.org/abs/2403.08295
Dettmers et al. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314, 2023. https://arxiv.org/abs/2305.14314
Hussein et al. A Survey of Demonstration Learning. arXiv:2303.11191, 2023. https://arxiv.org/pdf/2303.11191.pdf
Codevilla et al. Exploring the Limitations of Behavior Cloning for Autonomous Driving. ICCV, 2019. PDF
AI‑enabled prediction of sim racing performance using telemetry data. 2024. ScienceDirect

Cite this work

@misc{HighVelocityAI2025,
  title={A Split-Brain Neuro-Symbolic Training Method for High-Velocity Autonomous Coaching from Telemetry},
  author={Rabimba Karanjai, Austin Bennett, Ajeet Mirwani, Alvaro Huanca Mamani, Hemanth, Jesse Nowlin, Jigyasa Grover, Lynn Langit, Margaret M., Sebastian Gomez, Vikram Tiwari},
  year={2025},
  howpublished={Blog Post / Working Paper}
}

Build Smarter AI Agents Faster: Introducing the Google Agent Development Kit (ADK)

The world is buzzing about AI agents – intelligent entities that can understand goals, make plans, use tools, and interact with the world to get things done. But building truly capable agents that go beyond simple chatbots can be complex. You need to handle Large Language Model (LLM) interactions, manage conversation state, give the agent access to tools (like APIs or code execution), orchestrate complex workflows, and much more. Introducing the Google Agent Development Kit (ADK) , a comprehensive Python framework from Google designed to significantly simplify the process of building, testing, deploying, and managing sophisticated AI agents. Whether you're building a customer service assistant that interacts with your internal APIs, a research agent that can browse the web and summarize findings, or a home automation hub, ADK provides the building blocks you need. Core Concepts: What Makes ADK Tick? ADK is built around several key concepts that make agent development more s...

RK's Rambling

Search This Blog