
Llama 4's 10M context window: bigger isn't always better for legal AI
April 7, 2025
Llama 4's 10M context window: bigger isn't always better for legal AI
Meta just released Llama 4, featuring a 10 million token context window in the Scout variant. That's 50x larger than Claude's 200k. You could fit entire case files, maybe entire firm document collections, into a single prompt.
Impressive engineering. But for legal work, it misses the point.
The "dump everything in" fallacy
The obvious use case: upload all your case documents, ask a question, let the model figure it out. Everything in context. No retrieval needed.
Here's why that doesn't work well for legal research:
The "lost in the middle" problem gets worse, not better. Research consistently shows that LLMs struggle to use information in the middle of long contexts. A 10M token window doesn't solve this—it amplifies it. Your critical precedent buried on page 847 of the context? The model might never attend to it properly.
Not all documents matter equally. A key Court of Appeal judgment should influence the answer more than a routine procedural order. Dumping everything into context treats all documents as equally important. That's not how legal reasoning works.
Speed and cost. Processing 10 million tokens is slow and expensive. If you're billing clients or trying to work efficiently, burning through massive context windows for simple questions doesn't make sense.
No verification. You still don't know if the model is using the right sources or making things up. Large context doesn't solve hallucination—it might even make it harder to catch.
What actually works for legal research
The better approach isn't fitting everything into context—it's intelligently selecting what goes in:
Targeted retrieval. Find the relevant documents first, then reason over them. A precedent search that returns the five most applicable cases gives better results than a context window containing everything in your document management system.
Dynamic weighting. Some documents are foundational to a matter; others are peripheral. Weight them accordingly. Give the seminal judgment more attention than the email confirming a meeting time.
Multi-step reasoning. Complex legal questions need to be broken down. Find the relevant statutory provisions. Then find how courts have interpreted them. Then apply to the facts. Each step can be focused and verified.
Source verification at each step. When the system cites something, you should be able to trace it back to an authoritative source. That's harder to do when the model is synthesising from millions of tokens of mixed input.
Where large context windows do help
Big context isn't useless. It helps with:
- Long document analysis: Reviewing a 200-page contract in full context
- Cross-referencing within a document: Finding how different sections relate
- Summarisation: Getting an overview of extensive materials
But these are components of legal work, not the whole thing. For research—finding the right law, verifying it's current, applying it to facts—precision matters more than capacity.
The practical upshot
Llama 4's 10M context window is a technical achievement. For some use cases, it'll be genuinely useful. But it doesn't change the fundamental requirements for legal AI: finding the right information, verifying it, and reasoning carefully.
We've built Andri around precision retrieval and multi-step verification specifically because that's what legal work demands. Larger context windows are a tool in the toolkit, not a replacement for the toolkit.
Try Andri and see how precision-focused legal AI works in practice.
Read also: why ChatGPT wrappers don't work for legal research, OpenAI's Deep Research vs specialised legal AI, and why agentic reasoning is the only path to production legal AI.