Microsoft Copilot vs. Andri: the flight attendant and the pilot

Most firms we walk into already have Copilot switched on. It came with the licence, someone in IT enabled it, and by the time we arrive there's usually a partner who's used it to tidy an email and now thinks of it, vaguely, as "our AI." That's a reasonable place to start from. It's also where a lot of the confusion begins.

Because then, inevitably, the same partner asks it about a limitation period. Or whether a clause in a shareholders' agreement is enforceable. Or to draft a section 21. And because Copilot is right there in the Word ribbon, friendly and confident and already logged in, it answers.

That's what this piece is about. Not whether Copilot is useful (it obviously is) but what it's actually for.

A good flight attendant is not a pilot

On a long flight the flight attendant is the reason the cabin works. They know where the sick bag is, how long until Amsterdam, which passenger needs the vegan meal. You'd miss them immediately if they weren't there.

Nobody reasonable asks them to fly the plane.

That's most of what we want to say. Copilot is very good at the cabin: your inbox, your calendar, your slides, your SharePoint. Legal practice is the flight deck, and it has different instruments, different training, and quite different consequences when something goes wrong. None of that is a dig at Copilot. It's just a different job.

What Copilot actually is

Underneath the enterprise branding, Copilot is a consumer AI, the same GPT-class models behind ChatGPT, plugged into the Microsoft Graph. The Graph is Microsoft's name for everything in your tenant: Outlook, OneDrive, SharePoint, Teams, your calendar. The clever bit is the grounding layer. Copilot can see your files and your emails, and use them when it answers.

For most knowledge work that's a genuinely good idea. An assistant that knows who you emailed on Tuesday and what was in the attached deck is worth more than one that doesn't, and Microsoft has done real engineering to make this cheap and consistent.

The question is what sits inside that grounding and what doesn't.

Inside: your mailbox, your Teams chats, your OneDrive, your company SharePoint. Fine for writing a meeting summary. Useless for a limitation question.

Outside: legislation, case law, the Gazette, EUR-Lex, Companies House as structured data, the Land Registry, FCA handbooks, neutral citations, paragraph numbers, overruling history, jurisdictional tags. None of that lives in the Graph. None of it is anywhere near it.

Copilot is built to reason over the contents of your mailbox. A legal AI platform has to reason over the body of law your advice is built on. These aren't the same database and they aren't the same product.

The pull of "good enough"

The honest reason Copilot gets used for legal work isn't that it's brilliant at law. It's that it's already there. You're already in Word, you're already paying, and for maybe seven out of ten questions you'll type into it, it will produce something that looks broadly right. A reasonable-sounding paragraph. A confident summary. Prose with the cadence of a second-year associate.

That's the whole trap. "Looks broadly right" is exactly what general-purpose models are optimised for, because they've been trained to sound helpful. Legal work doesn't need something that sounds right. It needs something that is right, and that you can cite. In a Word document you can't tell the difference. In a hearing bundle you very much can.

Two real examples from our own testing. Ask Copilot about a specific Dutch Civil Code article and the answer is usually coherent at the level of a first-year summary, but it has no way to ground the response in the current authoritative text, because the tenant Graph doesn't contain the BW. So it paraphrases from pretraining, which drifts. That's not a model-quality problem; it's a grounding problem. No amount of better reasoning fixes a system that isn't looking at the statute it's talking about. Or ask whether a recent Supreme Court judgment is still good law and it'll happily summarise the case. It won't mention that it was distinguished by a later decision three months ago, because "is this still good law" isn't a first-class concept for a general model; it's a thing a legal platform has to be engineered around.

A general model sees text. A legal platform has to see law.

Bundling is not a villain

There's a version of this argument that calls Copilot a trojan horse, the implication being that Microsoft is sneakily pushing a consumer AI into professional workflows where it doesn't belong. We don't buy that. Bundling Copilot into M365 is just a sensible product decision, and for most people who use Office it's a gift. If your day is mostly Outlook and PowerPoint it's an unambiguous upgrade.

The narrower problem, the one that actually matters for law firms, is that a bundle creates a default. Copilot has no idea it's talking to a solicitor about a client matter. It has no idea the document open in front of it is a witness statement under legal professional privilege. It wasn't designed to know. It was designed to be helpful to everyone a little, which is a different brief from being helpful to a regulated profession a lot. That's not malice, it's scope. And scope, for legal work, is the thing.

Our quiet view is that Copilot is hanging out at the firm, wearing the lanyard, hoping nobody asks it something it wasn't built to answer. Most of the time nobody does. The rest of the time, someone should.

What a legal platform actually has to carry

We get asked, fairly often, what the actual difference is. It isn't the model. It's almost never the model. It's the dozen other things a legal tool has to do that a productivity assistant was never designed for.

Matters, not chats. In Copilot each conversation is a fresh session layered over your Graph. In Andri every interaction belongs to a matter: the file, the correspondence, the history, the previous drafts. You don't re-explain the case every morning. It's just there.

Authoritative sources. Andri is wired into real legal sources: official law reports, legislation portals, the Gazette, EUR-Lex, company and land registries. Every answer traces back to something you could put in a bundle. Copilot grounds on your tenant and, when it reaches beyond it, on Bing web results. That is not a substrate you want to build legal advice on.

Citations that actually hold up. Neutral citations, paragraph numbers, correct court, correct year. General models produce references that look right and are quite often wrong, such as a real case with the wrong paragraph, or a hallucinated case that reads like a real one. Catching that is the entire job, and a consumer assistant is not built to do it.

Time and jurisdiction as first-class things. Law changes, and English contract law is not Dutch contract law. In a general model, "when" and "where" are whatever the training data happened to absorb. In a legal platform they're structural.

Production workflows that fit the practice area. A tenancy dispute, an employment matter, a judicial review: each has its own documents and its own structure. There isn't one "draft me a document" button that covers them.

Different models for different jobs. We use several, because reasoning through a case and summarising a file are not the same task. When a better model ships, our customers are on it within a day or two, after it's passed an evaluation set we built with practising lawyers. Copilot is whatever Microsoft has wired up for that surface this week.

Things a solicitor actually asks for. Transcribing a witness recording into the matter file. Pulling EXIF metadata off a photograph for a forensic question. Parsing an iXBRL filing from the client's accountant. These aren't exotic asks, they're Tuesday-afternoon asks, and they are not on Copilot's roadmap because they aren't consumer productivity features.

The data question, honestly

We should be fair about this part. Copilot's data story is better than consumer ChatGPT's. Under Microsoft's enterprise licensing your prompts aren't used to train their foundation models, and the tenant boundary is real. That's meaningful, and it's why firms we work with can happily use Copilot for non-legal work without losing sleep.

But it's still a general-purpose tool sitting on a general-purpose cloud contract. There's no legal-specific DPA. There's no concept, anywhere in the stack, of "this document is under LPP and must never become training context, even accidentally." You're inheriting an agreement that was written to fit every Microsoft customer from supermarkets to shipping companies, and asking it to stand in for something a specialist legal DPA is built to do.

That's fine for marketing. It's thinner than it should be for a law firm. In our piece on ChatGPT vs. a legal platform we wrote about the broader shape of this, and in February the US federal ruling in United States v. Heppner made one version of it concrete: documents produced with consumer AI tools may not be protected by attorney-client privilege. Copilot isn't in the same place on that spectrum as free ChatGPT. It's closer to it than most firms realise.

What Copilot is genuinely good for

Refusing to say this part would be dishonest, because there's quite a lot of it. In a law firm, Copilot earns its keep doing exactly the things it was built for:

Summarising long email threads so you can triage an inbox on a Monday.
Drafting the non-privileged, non-client-facing material such as internal memos, meeting agendas, management board updates, partner-retreat decks.
Cleaning up a PowerPoint a junior threw together in a hurry.
Transcribing an internal Teams call that nobody would ever claim privilege over.
Making the finance, ops and marketing teams meaningfully faster.
Answering "where did I save that thing?" questions across SharePoint and OneDrive.

A firm that uses Copilot well for admin and a real legal platform for legal work is, to us, exactly where firms should land. The mistake isn't using Copilot. It's assuming that because it lives in the same ribbon as everything else, it can do everything else.

Fast, cheap inference is a feature

Credit to Microsoft on one real thing: they are remarkably good at delivering AI inference at scale, cheaply, inside software people already open every day. That's a genuine achievement, and it's most of why Copilot is everywhere. For "shorten this email by thirty per cent" or "turn this 45-minute meeting into five bullets," fast cheap inference wired straight into the document is exactly the right product.

Legal work doesn't scale the same way. A bad £5 answer about a limitation period can cost a firm a seven-figure claim, and no amount of cheap inference fixes that. The economics of legal AI aren't about pushing marginal cost to zero. They're about being right, with sources, on work where being wrong is an uninsurable professional risk. Different product. Different quality bar. Different thing entirely.

To Microsoft's credit, they've never really claimed Copilot is that thing. The category of specialist legal AI exists because legal work deserves a tool built for it, and not everyone is building that tool.

If you've never seen one of these up close

Most solicitors we onboard have tried ChatGPT. A lot have tried Copilot. A much smaller number have ever sat in front of a proper legal AI platform with real matters, real documents and real citations. If that's you, you can't really feel the difference from a blog post. You have to see it move.

So try this. Pick a matter you already know. Something where you know the right answer and the wrong answer off the top of your head. Ask Copilot about it in Word. Then ask Andri about it in Andri. Watch what each one does with everything around the question: the file, the documents, the earlier drafts, the sources, the export. The answer to whether these are the same kind of tool usually takes about ten minutes.

That's the comparison we'd like firms to make. Not Copilot vs. Andri on a marketing page. Copilot vs. Andri on your next real matter.

Try it or get in touch. We're happy to walk you through it on your work, not ours.