In West African Gas Pipeline Company Ltd v Willbros Global Holdings Inc [2012] EWHC 396 (TCC), inadequate scoping of the disclosure exercise led to piecemeal disclosure, increased costs, and a judgment that has been referenced ever since when courts assess whether parties have taken reasonable steps to preserve and produce disclosable documents. The case remains a touchstone for a reason. The cost of late or incomplete disclosure compounds in ways that are difficult to recover from once they have happened.
In the years since, disclosure expectations in the English Business and Property Courts have tightened further. Practice Direction 57AD, made permanent in October 2022, now requires parties to engage early on the scope of disclosure, agree models for different issues, and complete a Disclosure Review Document before the case management conference. Judicial commentary describes a “dramatic decline” in post-CMC applications for specific disclosure under the new regime, which is to say that the courts now expect parties to know where their evidence is before they get to court[1].
That expectation depends on a capability most organisations have never deliberately built. The ability to interrogate their own data environment at scale, under pressure, and produce a complete picture of what they hold on any given issue.
Most organisations assume this isn’t a problem they have. The documents are there. The records exist. The emails, contracts, case files, and communications all sit within systems the organisation manages. On paper, the information is present.
What they discover, usually under pressure and usually at the worst possible time, is that present and usable are two entirely different things.
What happens when the search actually begins
The gap between having information and being able to work with it tends to be invisible until it isn’t.
A request arrives: a disclosure requirement, a regulatory enquiry, a need to build a position quickly before a deadline that isn’t moving. The assumption is that the relevant material can be pulled together without too much difficulty. The team knows the information exists. They’ve seen documents like the ones being requested. They have a reasonable sense of where things should be.
And then the search begins. And it becomes apparent, fairly quickly, that “where things should be” and “where things are” aren’t the same thing.
Documents are spread across shared drives that were organised by different people at different times under different naming conventions. The document management system has records, but not all of them, and the ones it has aren’t always the most current version. The email archive has material that isn’t in the document system, and the document system has material that was never emailed to anyone. Scanned records from before the current systems were implemented sit in a repository that isn’t easily searchable. Some of what’s been found is clearly relevant. Some of it might be. Some of it is impossible to assess without reading it in full.
So the work becomes manual. People start opening files. Reading through content that may or may not be relevant to establish whether it actually is. Trying to reconstruct the context around individual documents by looking at what sits near them in the folder structure or what was attached to the same email thread. Building a picture of what the evidence shows by assembling it piece by piece rather than querying it as a dataset.
And then comes the question that appears in almost every organisation that goes through this process, at some point in almost every significant legal matter.
“I think that’s everything. But we should double-check.”
That sentence is the sound of an organisation that has information it can’t fully trust. And in legal work, information you can’t fully trust isn’t evidence. It’s a liability.
Why the uncertainty is the real problem
The natural response to “I think that’s everything” is to check again. To go back through the systems, run the search terms again, ask the people who might know whether there’s anything else. To invest more time in establishing confidence that the picture is complete.
That process works, eventually, in most cases. But it has costs that extend well beyond the hours spent searching.
The most immediate cost is time – and in legal matters, time has a direct relationship with quality. The team that spends three days establishing that the evidence picture is complete has three fewer days to analyse what that evidence actually means, to identify the arguments it supports, to anticipate how it might be used against them, and to build the strategy that the case requires. Late-stage discovery of a significant document doesn’t just mean the document needs to be processed. It means everything built on the incomplete picture that preceded it needs to be revisited.
The less visible cost is confidence. A legal team that isn’t sure it has found everything it needs to find operates differently from one that is. Positions get qualified more heavily than they should be. Arguments that should be made firmly get hedged. Decisions about how to proceed get deferred pending one more check of the data. The uncertainty in the evidence picture translates directly into uncertainty in the legal strategy. And uncertainty in legal strategy, under the kind of pressure that significant litigation creates, has a way of compounding.
The deepest cost is the one that’s hardest to quantify: the things that weren’t found. The pattern that would have been visible if the correspondence could have been analysed as a dataset rather than reviewed file by file. The connection between two documents in different systems that would have changed the entire framing of the case. The timeline that emerges clearly when events are mapped across multiple sources simultaneously, but that nobody had the time or the tools to construct under the pressure of the actual matter.
These aren’t hypothetical losses. They’re the routine consequence of operating a legal data environment that was built for storage rather than for use.
Why the problem persists even in organisations that know it exists
The frustrating thing about this category of problem is that most organisations are aware of it. Legal teams know their data environment is imperfect. They’ve experienced the friction of difficult searches. They’ve had the “I think that’s everything” conversation more times than they’d like.
What they haven’t always done is addressed the root cause, and the reason tends to be a combination of timing and visibility.
The root cause is structural. Documents get captured into systems without consistent classification. Metadata – the structured information that describes what a document is, who it relates to, when it was created, and how it connects to other documents – is applied unevenly, or not at all. Relationships between records that are obvious to the people who created them aren’t represented in the data in any way that a system can use. Different parts of the organisation have captured similar information in different ways, so that what should be a coherent dataset is actually a collection of incompatible fragments. The scale of this gap is now well documented: IDC has consistently estimated that over 90% of enterprise data is unstructured, and a December 2025 study by Harvard Business Review Analytic Services and Hyland found that only 39% of executives say their unstructured data is prepared for AI use, against 65% for structured data[2].
None of this is visible in normal operations. The system functions. Documents can be retrieved individually. People who know the environment can navigate it reasonably well. The problems only surface fully when the data is put under the kind of pressure that requires it to be interrogated at scale – which is exactly the kind of pressure that legal matters create.
And because the problems surface under pressure, they tend to get addressed under pressure; which means reactively, expensively, and with the kind of shortcuts that store up further problems for the next time. The underlying structure doesn’t get fixed. The next significant matter starts from the same position as the last one.
What a well-structured legal data environment actually enables
The difference between a legal data environment built for storage and one built for use isn’t primarily about the volume of information it contains or the sophistication of the systems it runs on. It’s about whether the information in it can be interrogated rather than just retrieved.
When documents are properly classified, consistently indexed, and enriched with metadata that reflects how they’ll actually be used, the experience of working with them in a legal context changes fundamentally.
A search for all correspondence with a specific counterparty between two dates doesn’t require someone to know which systems that correspondence might be in, or what it might have been filed under, or who might have a copy of it. It returns a complete and accurate set of results that the team can rely on. A request for all documents relating to a specific transaction doesn’t require a manual trawl through a folder structure that was organised by someone who has since left the organisation. It produces a structured view of the relevant material that can be assessed and used without first having to be assembled.
But the more significant capability – the one that changes not just the efficiency of legal work but its quality – is what becomes possible when evidence can be analysed as a dataset rather than reviewed as a collection of individual files.
Patterns across a body of correspondence become visible in a way they never are when documents are read one at a time. The frequency and tone of communications between specific parties across a specific period. The sequence of events that emerges when emails, contracts, and internal records are mapped chronologically across multiple systems. The absence of documentation in a period where documentation should exist, which can be as significant as the documentation itself. None of these insights require more documents. They require the existing documents to be in a form that allows them to be analysed together rather than read separately.
Legal teams working in this kind of environment describe their experience of case preparation differently. The question isn’t “where might that document be?” or “have we found everything?” It’s “what does the evidence tell us?”, the question that should be driving the work from the start, rather than the question that only becomes possible once the search is finally complete.
The connection to AI-assisted legal work
The conversation about AI in legal services is moving quickly, and it’s worth being direct about how document structure connects to it. For organisations that are planning to use AI tools in legal work, the state of their data environment isn’t a background consideration. It’s the primary determinant of whether those tools deliver what they promise.
AI tools applied to legal document review can process large volumes of material faster than human teams. They can identify patterns, surface connections, and prioritise documents by relevance in ways that change what’s practical in the time available. These are genuine capabilities that create real value.
But they depend entirely on the data being in a form that the tools can work with. PD57AD allows technology-assisted review, and the Law Society’s November 2025 guidance on generative AI in legal disclosure has explicitly opened the door to the wider use of GenAI under court-ordered disclosure[3]. At the same time, the Divisional Court’s decisions in Ayinde v London Borough of Haringey and Al-Haroun in June 2025 made clear that solicitors and barristers remain fully accountable for AI-generated content and may face regulatory sanction if they fail to verify its accuracy. Disclosure workflows under PD57AD must be consistent, auditable, and defensible. None of that is possible if the underlying data environment is too fragmented for the tool to navigate coherently.
An AI tool applied to a well-structured, consistently indexed legal dataset surfaces insights that change how cases are prepared. The same tool applied to a fragmented, inconsistently structured collection of documents produces outputs that look like insight but are built on an incomplete picture – and that can be more dangerous than no insight at all, because they carry a false confidence that the picture is complete when it isn’t.
The organisations investing most effectively in AI-assisted legal work are not the ones with the most sophisticated tools. They’re the ones that recognised the data foundation as the prerequisite and built it before or alongside the technology investment rather than after.
How Dajon helps organisations build that foundation
At Dajon Data Management, we work at the point where documents are first captured – defining the classification framework, applying consistent metadata, and building the structural connections between documents that reflect how the material will actually be used.
For organisations with existing environments built for storage rather than use, we work on retrospective structuring: assessing what’s there, identifying the most significant gaps, and building the metadata layer that turns the archive from a collection of files into a dataset that can be interrogated.
The moment it matters most
Legal matters have a way of arriving with less notice than anyone would like and less time than the situation requires. The organisations that navigate them most effectively aren’t necessarily the ones with the best legal teams or the deepest resources (though both of those things matter). They’re the ones that don’t have to spend the first days of a significant matter working out where their evidence is and whether they’ve found all of it.
That confidence doesn’t come from the legal team. It comes from the data environment they’re working with. And the data environment is built – or not built – long before the matter arrives.
The evidence that could be most useful to you may already exist in your systems. Whether it’s actually usable when you need it depends on decisions that were made, or not made, about how that information was captured, structured, and organised.
If your strongest evidence already exist; how confident are you that your team could find it, trust it, and use it when it matters most?
Dajon Data Management helps organisations build legal data environments where evidence is structured, searchable, and ready to be used when it’s needed. Get in touch to understand where your current data environment might be creating risks you haven’t fully mapped.
References
- The new face of disclosure: a three-year journey in the Business and Property Courts Mills & Reeve[↩]
- The AI Economy: The Data Gap Stalling Agentic AI The Letter Two[↩]
- Generative AI in legal disclosure: a practical guide The Law Society[↩]
