RK's Rambling

Posts

Fine-Tuning at the Redline: Architecting a Trustable AI Racing Coach

" AI in a Jupyter notebook is safe. AI on a race track—at 150 mph—is a different story." - Ajeet Mirwani When we accepted the High-Velocity AI Field Test , the challenge was clear: Build an AI system that could coach a driver in real-time. But in motorsport, "real-time" doesn't mean a fast web request (500ms). It means sub-50ms . At 150 mph, a half-second of latency puts you 110 feet further down the track, the difference between hitting the apex and hitting the wall. We quickly realized we couldn't rely on the cloud for everything. We needed a "Split-Brain" architecture: Gemini Flash 3.0 in the cloud for high-level strategy, and a fine-tuned Gemma designed for immediate, reflex-based coaching. While we haven't put the model in the driver's seat just yet, we have successfully completed the critical first phase: engineering the "brain" capable of understanding high-speed telemetry. Here is the deep technical dive into ho...

The Context Trap: Why Scaling Laws Can’t Break the Ceiling of Uncertainty

We are currently living through the “ Post-Reasoning ” phase of the AI hype cycle. By now, models like Gemini 2.0 and the latest iterations of Gemini have normalized the idea that machines can “think”—or at least, simulate a chain of thought that feels indistinguishable from reasoning. But as we push these architectures to their absolute limits, we are starting to see a plateau. It isn’t a plateau of competence; the models are brilliant. It is a plateau of certainty . In building applications on top of these models, I’ve noticed a recurring pattern. Developers (myself included) often assume that if a model fails to predict the right outcome, it’s a failure of intelligence. We assume we need a larger parameter count, a longer context window, or better fine-tuning. But there is a ghost in the machine that scaling laws cannot exorcise. It is the fundamental difference between not knowing and not seeing . The Architecture of Doubt To understand why our models, even state...

Building Production AI Systems with Free Cloud GPUs

🚀 The Google Colab VS Code Extension Enterprise AI Without Enterprise Costs If you've been following my work on LLM-assisted code generation and AI reasoning , you know I'm always looking for ways to democratize AI development . During my recent work on cross-chain smart contract generation , I needed to rapidly prototype different transformer architectures for code translation. Previously, this meant juggling between local development, cloud instances, and Colab notebooks . Now? I'm running everything from VS Code with zero context switching. Here's what changed my workflow: Python - Loading Large Models on Free GPU Copy Code # Running this on a free T4 GPU in VS Code - no setup required from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load a 7B parameter model - impossible on most local machines model = AutoModelForCausalLM.from_pretrained( "codellama/ CodeLla...

🎤 SpeakWise: Build an AI Public Speaking Coach with Gemma 3n

SpeakWise: Build an AI Public Speaking Coach with Gemma 3n Project by: Rabimba Showcase for: Google Developer Expert (GDE) AI Sprint Imagine having a private, real-time AI coach that watches your presentations, listens to your speech, analyzes your slides and gestures, and provides actionable feedback to help you improve confidently. Using Google’s new Gemma 3n, we built exactly that: SpeakWise , an AI-powered public speaking coach that leverages multimodal understanding to transcribe, analyze, and critique your talks—all while keeping your data private. Github Code. 🚀 Why Gemma 3n? Gemma 3n is Google’s open multimodal model designed for on-device, privacy-preserving, real-time AI applications . It is uniquely capable of: 📡 Simultaneously processing audio, image, and text, forming a holistic understanding of your talk. 🗂️ Following advanced instructions (“Act as a world-class presentation coach” and structuring output into clear, actionable insights). ...

From Models to Agents: Shipping Enterprise AI Faster with Google’s MCP Toolbox & Agent Development Kit

This article is an expanded write-up of the talk I recently delivered as a Google Developer Expert during a talk in Denver. The full slide deck is embedded below for easy reference. Why another “agent framework”? Large-language models (LLMs) are superb at generating prose, but production-grade systems need agents that can reason, plan, call tools, and respect enterprise guard-rails. Traditionally, that means: Hand-rolling connectors to databases & APIs Adding authentication, rate-limits, and connection pools Patching in tracing & metrics later Hoping your YAML jungle survives the next refactor Google’s new duo— MCP Toolbox and the Agent Development Kit (ADK) —eliminates that toil so you can treat agent development like ordinary software engineering. MCP Toolbox in one minute ⏳ What Why it matters Open-source MCP server Implements the emerging Model Context Protocol ; any compliant age...

Deep Dive into the Google Agent Development Kit (ADK): Features and Code Examples

In our previous overview, we introduced the Google Agent Development Kit (ADK) as a powerful Python framework for building sophisticated AI agents. Now, let's dive deeper into some of the specific features that make ADK a compelling choice for developers looking to create agents that can reason, plan, use tools, and interact effectively with the world. 1. The Core: Configuring the `LlmAgent` The heart of most ADK applications is the LlmAgent (aliased as Agent for convenience). This agent uses a Large Language Model (LLM) for its core reasoning and decision-making. Configuring it effectively is key: name (str): A unique identifier for your agent within the application. model (str | BaseLlm): Specify the LLM to use. You can provide a model name string (like 'gemini-1.5-flash') or an instance of a model class (e.g., Gemini() ). ADK resolves string names using its registry. instruction (str | Callable): This is crucial for guiding the agent's be...