Skip to main content

The Context Trap: Why Scaling Laws Can’t Break the Ceiling of Uncertainty



We are currently living through the Post-Reasoning phase of the AI hype cycle.

By now, models like Gemini 2.0 and the latest iterations of Gemini have normalized the idea that machines can “think”—or at least, simulate a chain of thought that feels indistinguishable from reasoning.

But as we push these architectures to their absolute limits, we are starting to see a plateau. It isn’t a plateau of competence; the models are brilliant. It is a plateau of certainty.

In building applications on top of these models, I’ve noticed a recurring pattern. Developers (myself included) often assume that if a model fails to predict the right outcome, it’s a failure of intelligence. We assume we need a larger parameter count, a longer context window, or better fine-tuning.

But there is a ghost in the machine that scaling laws cannot exorcise. It is the fundamental difference between not knowing and not seeing.

The Architecture of Doubt

To understand why our models, even state-of-the-art ones, hit a wall, we have to look at what they are actually doing. Despite the “Reasoning” labels on the box, modern LLMs are fundamentally probabilistic engines. They estimate a conditional probability distribution:

P(Y | X)

Given a context X (your prompt, a code snippet, a video), what is the most likely target Y?

In the early 2020s, we spent all our energy optimizing the function that maps X to Y. We assumed that if we just made the neural network dense enough, the error rate would drop to zero. But this ignores the statistical reality that error comes from two distinct places:

  • Epistemic Uncertainty: The model doesn’t know the answer because it hasn’t seen enough training data or lacks the computational depth to find the pattern. This is solvable. This is what “scaling up” solves.
  • Aleatoric Uncertainty: The answer cannot be derived from the input. The data X simply does not contain the information required to resolve Y.

This second category is the silent killer of AI reliability.

The Oracle’s Blindfold

Consider a thought experiment. You give a multimodal model like Gemini Pro a high-resolution image of a poker table and ask it to predict who will win the hand.

The model can identify the cards on the table. It can analyze the players’ facial expressions for micro-expressions (bluffing). It can calculate the pot odds with superhuman precision. It might give you a probability:

“Player A has a 60% chance of winning.”

But if the outcome depends on the hidden cards in the deck, the model hits a hard ceiling. No amount of extra compute, no amount of “System 2 thinking,” and no amount of historical training data will improve that prediction. The information is simply orthogonal to the input sensors.

We call this the Bayes Error Rate—the lowest possible error rate for any classifier on a given outcome.

In 2025, we are guilty of conflating confidence with calibration. We teach our models to sound sure of themselves. If a model predicts a stock movement or a medical diagnosis, it often mimics the assertive tone of the human experts in its training data. But unless the model has access to the causal variables driving the outcome, that confidence is a hallucination of competence.

The Multimodal Trap

The push toward native multimodality was, perhaps unintentionally, the first step toward addressing the “Aleatoric” problem. By allowing a model to ingest video and audio simultaneously with text, we aren’t just giving it more data; we are giving it better data. We are expanding the dimensions of X.

However, we are still treating the input as a fixed variable. In the current paradigm, we feed the model a dataset and ask, “How well can you predict?”

The next leap in AI won’t come from asking the model to predict better. It will come from the model asking for better inputs.

From Prediction to Measurement

If we want to break the current ceiling of predictability, we have to stop treating AI as a brain in a jar and start treating it as part of a sensory system.

In healthcare, for example, we are obsessed with feeding electronic health records into LLMs to predict readmission rates. We might get an AUC of 0.75 and wonder why it won’t go higher. We blame the model architecture.

The reality? The outcome might depend on whether the patient has a supportive spouse at home—a variable that does not exist in the electronic health record. The ceiling is 0.75 because the signal isn’t there.

True intelligence involves recognizing this deficit. A truly intelligent agent shouldn’t just output a probability; it should output a request for measurement. It should say:

“I cannot predict Y with confidence based on X. To reduce uncertainty, I need to measure Z.”

The Future is Active Sensing

As we look toward 2026, the most exciting developments won’t be in the transformer architecture itself. They will be in the integration of these models with active sensing.

  • Coding: Instead of just predicting the bug, the IDE inserts a logging statement to capture the missing runtime variable.
  • Science: Instead of predicting the protein fold, the system suggests the specific wet-lab assay needed to resolve the ambiguity.

We have spent the last decade building better prediction engines—machines that have developed a deeply intuitive “gestalt” about the world. They are incredible at intuition. But intuition without observation is just guessing.

To lift the ceiling of what is predictable, we don’t need bigger models. We need to expand the observable universe of the data itself.

We need to stop trying to force our models to be oracles, and start designing them to be scientists.

Comments

Popular posts from this blog

Deep Dive into the Google Agent Development Kit (ADK): Features and Code Examples

In our previous overview, we introduced the Google Agent Development Kit (ADK) as a powerful Python framework for building sophisticated AI agents. Now, let's dive deeper into some of the specific features that make ADK a compelling choice for developers looking to create agents that can reason, plan, use tools, and interact effectively with the world. 1. The Core: Configuring the `LlmAgent` The heart of most ADK applications is the LlmAgent (aliased as Agent for convenience). This agent uses a Large Language Model (LLM) for its core reasoning and decision-making. Configuring it effectively is key: name (str): A unique identifier for your agent within the application. model (str | BaseLlm): Specify the LLM to use. You can provide a model name string (like 'gemini-1.5-flash') or an instance of a model class (e.g., Gemini() ). ADK resolves string names using its registry. instruction (str | Callable): This is crucial for guiding the agent's be...

Curious case of Cisco AnyConnect and WSL2

One thing Covid has taught me is the importance of VPN. Also one other thing COVID has taught me while I work from home  is that your Windows Machine can be brilliant  as long as you have WSL2 configured in it. So imagine my dismay when I realized I cannot access my University resources while being inside the University provided VPN client. Both of the institutions I have affiliation with, requires me to use VPN software which messes up WSL2 configuration (which of course I realized at 1:30 AM). Don't get me wrong, I have faced this multiple times last two years (when I was stuck in India), and mostly I have been lazy and bypassed the actual problem by side-stepping with my not-so-noble  alternatives, which mostly include one of the following: Connect to a physical machine exposed to the internet and do an ssh tunnel from there (not so reliable since this is my actual box sitting at lab desk, also not secure enough) Create a poor man's socks proxy in that same box to have...

Build Smarter AI Agents Faster: Introducing the Google Agent Development Kit (ADK)

The world is buzzing about AI agents – intelligent entities that can understand goals, make plans, use tools, and interact with the world to get things done. But building truly capable agents that go beyond simple chatbots can be complex. You need to handle Large Language Model (LLM) interactions, manage conversation state, give the agent access to tools (like APIs or code execution), orchestrate complex workflows, and much more. Introducing the Google Agent Development Kit (ADK) , a comprehensive Python framework from Google designed to significantly simplify the process of building, testing, deploying, and managing sophisticated AI agents. Whether you're building a customer service assistant that interacts with your internal APIs, a research agent that can browse the web and summarize findings, or a home automation hub, ADK provides the building blocks you need. Core Concepts: What Makes ADK Tick? ADK is built around several key concepts that make agent development more s...