The In-House Triangle: How Andri Actually Works

Your AI assistant just asked you to explain the same case facts for the third time this week. It forgot the defendant's address. It recalculated court fees using last year's schedule. It drafted a particulars of claim that reads like it was translated from something directly out of California.

This is the problem with legal AI that visits your firm instead of living in it.

When we started Andri, I knew distributed systems and nothing about griffierechten, CPR deadlines, or calculating partner tabellen. The lesson came fast: personalisation is the system boundary. If the agent doesn't operate inside the firm's constraints. its style, jurisdiction, and cases. It's just an expensive tourist.

Stated plainly: Personalisation sets the boundary. Tools make it real. Memory makes it repeat. Miss one and the agent is generic. Connect all three and it works.

How brains and models actually work (and why it matters)

Every morning you wake up with an empty working memory. You don't remember what you were thinking about yesterday afternoon. Yet within minutes you're checking case files, knowing which matter needs attention, writing in your firm's style, and applying decades of legal training.

You have layers:

Your DNA (ACGT sequences) defines what kind of organism you are. Human, not cat. Lawyer-capable brain, not goldfish. Fixed. You can't wake up and decide to have different DNA.

Your long-term memory stores everything you've learned: how to draft a witness statement, what CPR 3.1 means, the fact that Partner Jenkins hates Oxford commas. Built up over years. Relatively stable.

Your working memory is what you're thinking about right now. Limited capacity, maybe 4-7 items. Gets wiped when you sleep.

Your environment and tools fill the gap. You don't memorize every court form. You have templates. You don't remember every case detail. You have a case management system. You don't recalculate interest manually. You have Excel.

LLMs work the same way. The layers map differently:

Model weights (the trained parameters) are like DNA. Fixed when you get the model. ChatGPT knows English and legal concepts because that's "in its DNA". baked into billions of parameters during training. You can't change this at runtime. It is what it is.

Training data is like education and long-term memory. The model "remembers" patterns from millions of legal documents it saw during training. But this is frozen. The model doesn't learn from your conversations. Ever.

Context window is working memory. When you start a new chat, it's empty. Blank slate. The model has no idea who you are, what case you're working on, or how your firm writes. It's a 200,000 token scratch pad that gets cleared every session.

Personalisation is where we bridge the gap. Your firm's templates, clause banks, citation preferences, case details. this is the environment we inject into that empty context window to make it feel like the model has been working at your firm for months.

Humans compensate for limited working memory by building an environment that matches how they work. You don't keep 40 court forms memorized. You have a forms folder. You don't remember every deadline. You have a calendar. You don't recalculate the same figures daily. You save the spreadsheet.

Andri does the same thing for the model. The context window is working memory. limited and reset every session. We fill it with:

Firm memory: your templates, style guides, clause banks
Matter memory: parties, deadlines, what happened last week
User memory: how verbose you want answers, your citation preferences
Tools: deterministic calculators, form fillers, API connectors

When a junior starts Monday morning, they don't remember Friday afternoon's conversation. But they walk into an office with files organised, templates ready, and colleagues who remember what happened. That's their environment compensating for limited working memory.

When Andri starts a conversation, the context window is empty. But we immediately load: which case you're working on, your firm's standard clauses, the last five things you discussed, and the tools to fill N1 forms or calculate court fees. That's personalisation compensating for the model's limited context.

The triangle is the environment we build around the model's working memory. Personalisation defines what gets loaded. Tools provide actions the model can take. Memory persists across sessions so Tuesday's conversation can reference Monday's work.

Without this environment, you're asking the model to do legal work with just its training and an empty notepad. With it, you're giving the model an office, a case management system, and institutional knowledge.

Vertex 1: Personalisation sets the boundary

Personalisation means the agent writes in house style, cites the way partners expect, respects your risk posture and deadlines, and surfaces only what matters to the current matter.

In practice:

Dutch firms get Dutch procedure, griffierechten, partner tabellen
UK boutiques get CPR-compliant deadlines, court fee schedules that updated last month, and N1 forms that auto-fill
Every firm gets its own clause bank, citation style, and template library

Why jurisdiction-specific tools matter:

Most legal AI is built for Silicon Valley's idea of "law". a generic mush that works nowhere precisely. Andri is built for Dutch civil procedure and UK CPR. The specifics. We encode griffierechten tables. We track CPR amendment schedules. We know that "soon" means 14 days in one jurisdiction and "within a reasonable time" in another.

Generic tools can't compete on the long tail of procedure. And that long tail is where 80% of the work actually lives.

Vertex 2: Modular micro-tools where it counts

Language models are excellent at language. They are mediocre at arithmetic, rule tables, and stateful workflows. Hand a model alimony calculations as pure text and you get slow guesses. Route the same task through a deterministic tool and you get a correct number, a source, and an audit trail.

From a tokenomics perspective, LLM arithmetic is probably the worst possible use case for an LLM. Every digit costs tokens, the model is slow at it, and the answer is often wrong. Deterministic code? Instant, correct, and free after the function call.

This follows Simon Willison's "tools in a loop" framing and matches our experience building agents that do real work.

Design rules we follow:

Single responsibility tools with explicit input/output JSON Schema
Deterministic by default, pure functions where possible, no hidden state
Decimal math, never binary floats, currency as integers of minor units
Versioned tools with contract tests and golden files
Time and jurisdiction are explicit inputs, not inferred
Every tool emits provenance: source, rule version, timestamp, notes

Representative tools in production:

Court fees that don't lie:
A fee schedule changes. Your associate spends 30 minutes finding the update, recalculating three cases, and hoping they caught everything. Andri's court fee tool updates overnight, flags affected matters, and recalculates instantly. One tool, zero meetings about "did we use the new schedule?"

Maintenance and alimony calculators:
Factor income, dependants, caps, local guidance (Dutch partner tabellen, UK spousal maintenance guidelines). The model supplies context, the tool supplies the number, and you get both the figure and the working.

Interest accrual on damages:
Date math that doesn't drift. Compound or simple, statutory rate or judgment rate, from when to when. No Excel, no finger-crossing.

Deadline calculators:
CPR or Dutch procedure encoded as rules. "File defence within 14 days" becomes a date, adjusted for weekends and court holidays, with the calculation shown.

Bundle utilities:
Paginate exhibits, generate indices, keep tables balanced across 200-page bundles without copy-paste errors.

The tool that saves 40 minutes per case: Form filling

Consider the N1 Claim Form. Eight pages, 40+ fields, half of them referencing information scattered across emails, your case management system, and that PDF the client sent last week.

The old way:

Open the PDF
Hunt for claimant details in your matter file
Copy defendant address from Companies House
Calculate the claim amount (or find where you wrote it down)
Copy-paste particulars from your draft (reformatting for space constraints)
realise you're on page 6 and the defendant address is wrong
Start over or patch it in post
40 minutes later, maybe you're done

The Andri way:

"Fill the N1 for the Roberts matter"
Andri pulls claimant details from matter memory
Resolves defendant via Companies House (name, number, registered office)
Lifts claim amount and interest calculation from the last memo
summarises particulars to fit the form's space constraints
Generates a filled PDF with every field sourced and cited
Shows you the diff: "Claimant address from matter memory (updated 2 weeks ago), defendant address from Companies House (confirmed today), claim amount from damages memo (approved by you on 15 March)"
You review, approve, done in 4 minutes

This isn't magic. It's memory (knows your matter), tools (queries Companies House, formats currency, paginates text), and personalisation (knows which particulars summary style your firm uses).

We support common UK court forms (N1, N244, and growing) and Dutch equivalents. Each form is a versioned tool. When HMCTS updates the N1, we ship the new schema, and existing matters get a flag: "Form version changed, review required."

The time savings are measurable. Firms report 40 minutes saved per form. More importantly: zero "I forgot to update the defendant address" errors.

Edge: Personalisation ↔ Tools

If personalisation defines what is acceptable, tools define how we achieve it. The edge between them is where reliability comes from.

Example: UK company resolution pipeline

Resolve the entity via Companies House to the exact company record (not "ABC Ltd" but "ABC Limited, Company No. 12345678")
Pull persons with significant control
Query the FCA register for FRN, permissions, public notices
Compose a structured facts block: legal name, company number, address for service, FRN, any insolvency or director changes
Insert into the draft with citations to filings and register entries

Resolution is two-stage. We filter candidates, rank by name similarity and officer overlap, then confirm by number if present. Connectors run under budgets; on slowness we degrade to cached results with explicit staleness warning. Outputs are stable data structures with version tags. We never pass raw JSON through the model loop.

Pattern: the agent supplies judgment and flow control, the tools supply accuracy and speed, and personalisation keeps both aligned with the firm.

Vertex 3: Memory. the moat most legal AI doesn't have

Tools provide actions. Without continuity you're back to copy-paste. Memory is the difference between a helpful answer and a teammate.

Remember the layers: the model's weights are fixed (DNA), its training is frozen (education), and its context window resets every session (working memory). Memory is how we make that blank slate feel personal.

Most legal AI tools treat every conversation like a first date. You explain the case. It gives an answer. Tomorrow, you explain the case again. This is why usage drops off. the cognitive load never decreases. It's like hiring a junior who has amnesia every morning but insists they're qualified because they went to law school.

You don't have this problem because your environment persists. Your case files stay organised. Your templates remain in the folder. Your calendar remembers deadlines even when you don't. The model needs the same thing.

Andri runs three memory layers with different durability:

Firm memory: Templates, clause banks, citation style, policies
Matter memory: Parties, issues, deadlines, exhibits, a rolling timeline of what happened
User memory: Verbosity preference, review style, how much explanation you want

How memory actually works (and why it's hard)

Think of it like a filesystem that slots into your agent. But here's the critical insight: memories are not just stored context, they are managed by the agents themselves.

The chat agent creates memories during research. The drafting agent reads those same memories when composing a pleading. The form-filling tool pulls matter facts to auto-populate fields. Every agent writes to a shared memory space scoped to firm, matter, or user.

The power: agent-driven pruning.

As context grows, the agent updates stale memories, merges redundant entries, and flags what's no longer relevant. This is not a naive append-only log. The agent curates.

When a case shifts from discovery to trial, matter memory evolves with it. When a user corrects a citation preference three times, user memory consolidates the pattern. When a firm adopts a new template, firm memory replaces the old version and notes the change.

This scales with intelligence. Smarter models make better pruning decisions, better merge choices, better scope judgments. They know when a memory belongs at case level versus user level versus firm level. The system doesn't rely on heuristics or magic token counts; the agent decides what to keep, what to summarise, and what to discard based on the task at hand.

Infinite context through managed curation

Your working memory holds 4-7 items. Yet you can draft a 50-page witness statement that stays coherent from page 1 to page 50. How? You offload to your environment. You write an outline. You check your notes. You reference the case file. You don't keep everything in your head simultaneously. you pull in what you need when you need it.

The model's context window is the same. It can't fit a 50-page document and all your firm's templates and three years of case history into 200,000 tokens. So we built a memory system that works like your brain's offloading strategy.

We combine the memory tool with traditional context window trimming, but the agent chooses what gets trimmed. The result is a system that stays coherent across long threads, complex matters, and multi-agent handoffs.

The agent doesn't drag everything into context; it pulls only what the plan requires, with explicit citations and a record of what was ignored. Just like you don't keep the entire case file in your head. you know where to find the relevant exhibit when you need it.

Essential for complex tasks. Multi-step flows that generate 100+ page documents cannot afford to lose context between page 5 and page 95, or confuse facts from different sections. Agent-managed memory solves this.

The drafting agent working on page 95 reads memories written during pages 1–94, knows what has already been cited, what arguments have been made, and what structure has been set. The memory is the connective tissue that keeps the document coherent without requiring the full draft in every context window.

Example:

On Monday, you discuss the Smith v. Jones case with Andri. Defendant is Jones Limited, Company No. 98765432. Claim is for £45,000 in unpaid invoices. You mention the contract was signed on 15 January 2023.

On Friday (and after much more discussion), you ask Andri to fill out the N1 form. Without memory:

"Who is the defendant?"
"What's the claim amount?"
You explain again.

With memory:

Andri pulls defendant details from matter memory (including the company number you mentioned once)
Lifts claim amount
Notes the contract date in the particulars summary
Shows you: "Defendant: Jones Limited (Co. 98765432) from matter memory, last updated Monday. Claim amount: £45,000 from Monday's discussion. Contract date: 15 January 2023."

You review, approve, done. No re-explaining. No hunting through old chats.

As memory accumulates, personalisation strengthens. Outputs start life inside your boundary rather than outside it. Treat tokens as a scarce resource, and curate rather than hoard.

Edge: Tools ↔ Memory

Tools give you correctness on a single step. Memory makes that correctness reusable and context-aware on the next step. The two reinforce each other: tools emit provenance the memory can store; memory supplies the exact parameters tools need.

When the court fee tool runs, it logs: fee schedule version, claim amount, track, date calculated. Matter memory stores this. Next time you draft particulars, Andri pulls the exact fee without recalculating. When the fee schedule updates, memory flags: "Court fee was calculated using 2023 schedule, new schedule published, recalculation recommended."

Edge: Memory ↔ Personalisation

Drafting is where this edge is most obvious. With memory in place, drafting becomes recomposition rather than heroics.

Find the closest prior document in firm memory
Lift the structure that works and the clauses that survive partner review
Tailor to style, client, jurisdiction, and the history of how this firm writes
Show what was reused, what changed, and why, then ask before sending

This is cheaper, safer, and measurably faster. Model quality usually improves along power-law curves, but the bigger wins come from better system shape.

Rich Sutton's "Bitter Lesson" favours general methods for open-ended problems; legal work has hard constraints. The model provides judgment and language. The tools provide correctness and provenance. Memory makes it repeatable. That division isn't a compromise; it's the architecture.

Centre: The agent loop that stitches the triangle

With the three vertices in place, the loop is intentionally simple:

Plan the task into typed steps and expected artefacts
Retrieve only what the plan needs from memory
Act by calling tools with validated inputs and capturing provenance
Draft text or data artefacts, not both at once
Critic passes to catch jurisdiction drift, missing sources, numeric mismatches
Stop and request approval before any external send or file operation

We cap iterations and log every step. The failure mode we engineer against is silent confidence.

Example: Full loop for "Draft particulars of claim for the Roberts matter"

Plan: Need claimant details, defendant details, claim facts, legal basis, relief sought
Retrieve: Matter memory for parties, contract details, breach facts. Firm memory for particulars template style.
Act: Call Companies House tool to resolve defendant's current registered office. Call interest calculator for statutory interest figure.
Draft: Compose particulars using template structure, insert resolved facts, add interest calculation with working shown.
Critic: Check all parties have addresses for service. Verify interest calculation cites correct rate and start date. Confirm claim amount matches prior memos.
Stop: Show you the draft with diffs highlighted ("Defendant address updated from Companies House today"), wait for approval.

This is the triangle in motion: personalisation defines the plan, tools execute it, memory feeds and records it.

Where the puck is going: Complex tasks

Customers are the lifeblood of Andri, and they're also busy. They see the pain in front of them, not always the leverage one layer up. We build for both. We remove today's friction, and we ship primitives that unlock tomorrow's complex tasks.

Just like you break down a big case into smaller tasks - draft the particulars, calculate damages, file the claim, prepare for hearing - we're making these repeatable. Plans become first-class objects with typed steps, explicit guards, human approvals, and reproducible runs. Moving from one-off prompts to repeatable tasks.

Example task: "New PI case intake"

Extract facts from client email and medical records
Calculate provisional damages (tool: damages calculator)
Draft letter before claim (memory: firm template, client details)
Set deadline reminders (tool: CPR deadline calculator)
Flag missing evidence (critic: check for medical report, witness statement)
Stop for review before sending letter

One click, one hour, one approval gate. The same triangle you've seen. composed into a higher-level control surface.

The near-term goal is O(1) approval checks and O(log n) retrieval inside a flow so complex matters stay predictable as they grow.

We measure, not vibe

The triangle only matters if it moves the numbers we care about.

Time to first useful draft: How long from "I need X" to "here's a draft you can actually use"

Review-to-ship ratio: How much editing before the draft is ready to file

We also track cite grounding rate, tool error budgets, cache hit rates, and the percent of tokens that are tool output rather than model invention. If a tool produces wrong numbers, we fix the tool and ship a test, not a longer prompt.

Our single objective function:

Minimize: TTFD + λ·Variance(ReviewEdits) + μ·HallucinationRate
Subject to: GroundingRate ≥ g_min, Provenance = true, Approval = true

Read it as a contract: fast drafts, consistently light review, zero tolerance for ungrounded claims, and an auditable trail.

Early results from UK and NL firms:

40 minutes saved per court form (N1, N244, Dutch equivalents)
60% reduction in time to first draft on standard pleadings
3x fewer citation errors caught at partner review (because tools cite by design)
85%+ of tool outputs used without modification (because they're deterministic and correct)

Vibes don't survive partner review. Numbers do.

Closing the loop

The model's weights and training are fixed. You can't change them at 3 PM on a Tuesday when you need to file by 5 PM. They are what they are.

But you can change the environment around the context window. That's where the business layer lives. That's where your firm's specific needs, your jurisdiction's quirks, your matter's facts, and your personal preferences get injected.

So that's how augmentation actually works. You're not retraining the model. You're building scaffolding around its working memory so it behaves like it's been at your firm for years. The model brings language and reasoning. Your firm brings context, tools, and memory. Together they produce work that neither could produce alone.

The best writing on context engineering and tool-using loops describes the map. This is our route across it. and what we've learned building Andri for 100+ firms who need AI that stays, not visits.

We're hiring. If you care about getting tools right, making memory scale, or just want to build software that lawyers actually use daily, talk to us.