Abstract
Real‑time coaching in motorsport is a safety‑critical learning problem: a system must map noisy, high‑frequency telemetry to short, actionable guidance that remains physically consistent and avoids hazardous recommendations.
This paper proposes a “Split‑Brain” training formulation that separates (i) a semantic coaching target (what action/critique should be expressed) from (ii) a reflexive interface (how actions are represented as compact, verifiable tokens). The approach trains a Small Language Model (SLM) in the Gemma family[1] using QLoRA fine‑tuning[2], and introduces a telemetry tokenizer plus teacher‑student synthesis pipeline to generate instruction‑action pairs at scale.
Core contribution: a reproducible method to convert “golden lap” differential telemetry into structured instruction‑tuning data, with an explicit safety-aware penalty that discourages physically contradictory cues.
1. Introduction
Driver coaching tools often visualize telemetry and highlight deltas, but they rarely produce reliable, context-aware micro‑instructions that can be executed immediately without distracting the driver.
A natural framing is demonstration learning: treat expert laps as demonstrations and learn a model that maps states (telemetry windows) to corrective actions, with the caution that naive supervised imitation can suffer from covariate shift and compounding errors when the learner encounters unseen states. [3][4]
Telemetry-based studies in sim racing identify performance-relevant signals (e.g., speed and acceleration features), supporting the premise that telemetry contains enough structure to support automated coaching labels and predictive models. [5]
2. Problem Statement
2.1 Inputs and outputs
Let \(x_{t-k:t}\) denote a telemetry window of length \(k\) ending at time \(t\), containing signals such as speed, steering, brake, throttle, lateral/longitudinal acceleration, yaw/yaw‑rate, and track position.
The model outputs a structured response \(y_t\) consisting of (i) a discrete action \(a_t\) from a finite vocabulary \(\mathcal{A}\) and (ii) an optional short rationale string intended for human interpretability.
2.2 Learning objective
Training minimizes negative log-likelihood of the reference response with a safety-aware regularizer:
\[ \min_{\theta} \; \mathbb{E}\Big[-\log p_{\theta}(y_t^* \mid x_{t-k:t})\Big] + \lambda \, \mathbb{E}\big[\Omega(y_t, x_{t-k:t})\big] \]
The key design goal is that \(\Omega(\cdot)\) penalizes action tokens that contradict a telemetry-defined safe set, while leaving the model free to choose among safe alternatives.
3. Data Methodology
3.1 “Golden lap” differential representation
Rather than learning from raw telemetry alone, supervision is built on differences between a novice lap \(N\) and an expert reference lap \(P\), so the prompt encodes what went wrong relative to a target trajectory.
Align \(N\) to \(P\) by track position \(s\) (preferred) and compute a differential state:
\[ \Delta(s) = \big[ v_N(s)-v_P(s),\; a^{lat}_N(s)-a^{lat}_P(s),\; a^{long}_N(s)-a^{long}_P(s),\; \psi_N(s)-\psi_P(s),\; \dot{\psi}_N(s)-\dot{\psi}_P(s) \big] \]
3.2 Telemetry tokenizer (feature-to-token mapping)
Because LLMs are trained over discrete token sequences, continuous telemetry is discretized into a compact vocabulary of “physics tokens” with controlled granularity.
Example token schema:
- Speed delta:
DV=+10mph,DV=-5mph(bin width configurable). - Lateral delta:
DLAT=-0.2g(bin width configurable). - Longitudinal delta:
DLONG=+0.3g(captures braking/throttle mismatch). - Rotation:
DYAW=-3degorDYAW_RATE=+6deg_s. - Context:
SECTOR=3,CORNER=7,PHASE=ENTRY|MID|EXIT.
3.3 Teacher‑student synthesis (scalable labels)
To avoid hand-labeling at scale, a deterministic “teacher” flags divergence events and generates paired labels: a discrete action and a short rationale.
Define an error score:
\[ E_t = \lVert p_P(t) - p_N(t) \rVert_2 + \alpha \, \lvert \psi_P(t)-\psi_N(t)\rvert + \gamma \, \lvert v_P(t)-v_N(t)\rvert \]
For \(E_t > \epsilon\), emit a training pair \((\text{prompt}_t, \text{response}_t)\) where \(\text{prompt}_t\) is tokenized telemetry and \(\text{response}_t\) is a constrained structured output.
Algorithm: Synthesize coaching pairs from differential telemetry
Inputs: expert lap P, novice lap N, thresholds ε, tokenizers φ, label rules R
For each aligned position/time t:
Δt ← compute_differentials(P(t), N(t))
Et ← error_score(Δt)
if Et > ε:
prompt ← φ(Δt, context(t))
action ← R.classify(Δt, context(t)) # finite action vocabulary
reason ← R.render_text(action, Δt) # short template or learned paraphrase
output ← format("<action>{action}</action> <reason>{reason}</reason>")
store(prompt, output)
4. Model & Training
4.1 Base model
The coaching model is a small, instruction-tuned language model from the Gemma family to support domain adaptation on limited resources. [1]
4.2 Parameter‑efficient fine‑tuning (QLoRA)
Fine‑tuning uses QLoRA, which trains low‑rank adapters while keeping a quantized base model frozen, enabling efficient adaptation with reduced memory usage. [2]
Report the following for reproducibility:
- Checkpoint: exact Gemma variant and whether it is pretrained or instruction-tuned.[1]
- Quantization: 4-bit training quantization configuration used for QLoRA.[2]
- Adapters: \(r\), \(\alpha\), dropout, and the targeted projection modules.
- Data: window length \(k\), bin sizes, number of pairs, and the final action vocabulary.
4.3 Safety-aware penalty
Define a telemetry-derived safe action set \(\mathcal{A}_{safe}(x_{t-k:t}) \subseteq \mathcal{A}\) computed by deterministic constraints (e.g., prohibit throttle-up cues during heavy deceleration windows).
A simple instantiation is:
\[ L(\theta) = - \sum_{t} \log p_\theta(y_t^{*}\mid y_{<t}^{*}, x_{t-k:t}) + \lambda \sum_{t} \mathbb{I}\!\left[\hat{a}_t \notin \mathcal{A}_{safe}(x_{t-k:t})\right] \]
4.4 Structured response format
Responses are constrained to a strict schema so that evaluation can be performed with exact matching and rule-based checks.
<action>brake_later</action>
<reason>You are releasing the brake too early on entry; carry trail brake slightly longer.</reason>
5. Evaluation (Training‑Focused)
5.1 Offline metrics
- Action accuracy: exact match of
<action>on held-out examples. - Safety violation rate: \(\Pr(\hat{a}_t \notin \mathcal{A}_{safe}(x_{t-k:t}))\).
- Phase confusion: misclassifications across
ENTRY/MID/EXITcontexts.
5.2 Split strategy
In addition to random splits, evaluate by holding out entire laps/sessions (and ideally entire drivers) to test generalization under distribution shift, a known challenge in imitation-style learning setups. [3][4]
References
- Gemma Team. Gemma: Open Models Based on Gemini Research and Technology. arXiv:2403.08295, 2024. https://arxiv.org/abs/2403.08295
- Dettmers et al. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314, 2023. https://arxiv.org/abs/2305.14314
- Hussein et al. A Survey of Demonstration Learning. arXiv:2303.11191, 2023. https://arxiv.org/pdf/2303.11191.pdf
- Codevilla et al. Exploring the Limitations of Behavior Cloning for Autonomous Driving. ICCV, 2019. PDF
- AI‑enabled prediction of sim racing performance using telemetry data. 2024. ScienceDirect
Cite this work
title={A Split-Brain Neuro-Symbolic Training Method for High-Velocity Autonomous Coaching from Telemetry},
author={Rabimba Karanjai, Austin Bennett, Ajeet Mirwani, Alvaro Huanca Mamani, Hemanth, Jesse Nowlin, Jigyasa Grover, Lynn Langit, Margaret M., Sebastian Gomez, Vikram Tiwari},
year={2025},
howpublished={Blog Post / Working Paper}
}
Comments
Post a Comment