Scanning does not feel like a compliance risk.
It feels like the opposite: a sensible, modern thing to do. Getting paper off the shelves, reducing physical storage, making information more accessible. Most organisations approach digitisation as a straightforward operational improvement, and on the surface, that is exactly what it looks like.
The problem is what happens to that information once it is digital. And for organisations handling personal data (which is most of them) that is where the exposure begins.
The Information Commissioner’s Office made the cost of getting this wrong unmistakable in late 2025. In October, the ICO fined the Capita group £14 million for security failings that compromised the personal data of more than 6.6 million individuals[1]. A month later it imposed a £1.2 million fine on LastPass UK Limited, calculated against the global revenue of its parent holding company[2]. The average ICO fine in 2025 jumped from £150,000 to over £2.8 million[3]. The regulator has not become more numerous in its enforcement; it has become more serious.
Why digitisation changes the risk profile rather than reducing it
There is a widely held assumption that moving from paper to digital improves control. In some respects it does. But it also fundamentally changes the nature of the risk, in ways that paper never created.
Physical documents are limited by their physicality. Access requires presence. Distribution requires effort. Duplication is visible. These are not virtues exactly, they are just constraints that happen to limit exposure by default.
Digital removes those constraints entirely. A document that once sat in a filing cabinet accessible to whoever had the key is now potentially searchable by anyone with system access, shareable in seconds, duplicable without trace, and accessible from any device on the network. The efficiency gains are real. But so is the expansion of the risk surface.
This is the dynamic that most scanning programmes do not fully account for. The focus goes on getting the documents in – on volume, on speed, on meeting the target of a paperless office – rather than on what happens to the data once it is there. And under UK GDPR, it is what happens to the data that matters.
Where scanning processes actually go wrong
The failures in scanning programmes are rarely dramatic. They do not tend to look like breaches in the moment they happen. They look like reasonable decisions made under operational pressure, and they accumulate quietly over time into something that becomes very difficult to defend when scrutiny arrives.
Documents get scanned and stored without being properly classified. The distinction between a routine operational record and a document containing sensitive personal data (medical information, financial details, identity documents, etc.) does not get made at the point of capture because the process was not designed to make it. Everything goes into the same system, under the same access controls, managed with the same level of governance regardless of what it actually contains.
Access controls get set at system level rather than data level. Which means that everyone who needs access to any of the digitised records ends up with access to all of them. That might be a manageable risk when the system contains routine correspondence. It is a different matter when it contains documents with personal data that only a small number of people should ever be able to see.
Metadata is applied inconsistently, or not at all. Which means that when someone needs to find a specific document, or when a regulator asks for evidence of how personal data is being managed, the search either fails or returns results that raise more questions than they answer. This matters more than it sounds: roughly 80% of enterprise data is unstructured, and unstructured data is growing three times faster than structured data[4]. Without structured metadata, scanned documents disappear into that mass.
Retention policies that were already loosely applied to paper records become even less coherent when applied to digital ones. Documents that should have been destroyed years ago persist in systems indefinitely because nobody has visibility of what is there or a process for reviewing it.
None of these failures is catastrophic in isolation. But collectively they create an environment where personal data is being held in ways that an organisation could not fully account for if asked – and under UK GDPR, the inability to account for how personal data is held is itself a problem, regardless of whether anything has gone actively wrong yet.
What UK GDPR actually requires – and where the gap tends to be
It is worth being specific about what the regulatory exposure actually looks like, because “GDPR risk” can mean different things in different contexts and vague risk does not motivate action the way specific risk does.
UK GDPR requires organisations to demonstrate that personal data is processed lawfully, stored securely, accessible only to those with a legitimate need, retained only for as long as necessary, and handled in a way that respects the rights of the individuals it relates to. The ICO has the power to fine up to £17.5 million or 4% of global annual turnover for serious breaches of these principles[5].
The question a scanning programme needs to answer is whether the way documents containing personal data are digitised, stored, and managed is consistent with that standard.
For many organisations, the honest answer is that they do not fully know. The scanning has happened. The documents are in a system somewhere. But whether the right classifications are in place, whether access is appropriately restricted, whether retention is being managed correctly, whether there is a reliable audit trail – these are questions that often do not have clean answers.
That uncertainty is the problem. The Capita case is instructive on this point. The ICO did not just fine the controller; it also issued a £6 million fine to Capita Pension Solutions Limited as data processor, and explicitly rejected the argument that prompt notification within 14 hours was a mitigating factor[6]. What the regulator was looking at was whether the controls in place beforehand were adequate, not whether the response afterwards was reasonable. That same logic applies to digitisation programmes. When something goes wrong, the question is not whether you reacted well; it is whether the structure you had built before the incident was defensible.
What well-governed digitisation actually looks like in practice
The organisations that manage this well have not necessarily done anything technically complicated. What they have done is treat scanning as a data governance activity rather than an operational one, and designed the process accordingly from the start.
That means classifying documents at the point of capture rather than retrospectively. When a document is scanned, a decision gets made about what type of document it is, what data it contains, how sensitive that data is, and what level of access and retention is appropriate for it. That classification drives everything that follows: where the document is stored, who can see it, how long it is kept, and when it needs to be reviewed or destroyed.
It means applying access controls at the data level rather than just the system level. Not everyone who needs access to the digitised document environment needs access to every document within it. Sensitive personal data should be accessible only to those with a clear and documented need, and that distinction needs to be built into the way the system is governed rather than managed informally.
It means building audit trails that can demonstrate, at any given point, who has accessed what and when. The ICO’s March 2025 fine of £3.07 million against Advanced Computer Software Group following a ransomware attack that exposed sensitive health data of nearly 80,000 people[7] was, at root, an enforcement action about access controls. The breach happened because a customer account did not have multi-factor authentication. The lesson sits as much with how access is governed as with how systems are configured.
And it means treating retention as an active discipline rather than a passive default. Documents do not just accumulate indefinitely. There is a process for reviewing what is held, applying retention schedules consistently, and disposing of data that no longer needs to be kept; in a way that can be evidenced.
None of this is particularly complex as a set of principles. What makes it difficult in practice is designing the scanning programme to embed these disciplines from the outset rather than trying to retrofit them once the documents are already in the system. Retrospective governance is significantly harder, slower, and more expensive than getting it right at the point of capture.
The cost of fixing it after the fact
This is worth dwelling on, because it is where the commercial case for doing this properly becomes most concrete.
An organisation that has digitised a large volume of documents without proper classification, access controls, or retention management faces a significant challenge if it then needs to bring that environment into compliance. The documents are there. The personal data is there. But the governance framework that should sit around them is not, and building it retrospectively means going back through material that was already processed once and doing the work that should have been done the first time.
That is expensive in time and resource. It is also disruptive, because the system people are depending on for day-to-day operations may need to be reorganised in ways that affect how they work while the remediation is happening. And it carries its own risk, because the period between recognising the problem and completing the fix is a period of heightened exposure.
Getting it right at the point of scanning costs a fraction of what remediation costs. That calculation tends to feel more compelling after a problem has surfaced than before one has.
How Dajon helps organisations get this right
At Dajon Data Management, we work with organisations on exactly this challenge. Making sure that scanning and digitisation programmes are designed from the outset in a way that supports compliance rather than creating new exposure.
That means combining high-quality scanning with proper document classification, structured metadata, access controls aligned to data sensitivity, and retention management built into the process rather than bolted on afterwards. The goal is a digitised document environment that an organisation can account for – where the right people have access to the right documents, where sensitive personal data is governed appropriately, and where the evidence of that governance is available when it is needed.
For organisations that have already digitised significant volumes of documents without that governance framework in place, we also help with the harder task of bringing existing environments into a more defensible position. Assessing what is there, identifying the gaps, and building the structure that should have been there from the start.
Scanning is a control point – treat it like one
Every document an organisation digitises is a decision about how personal data will be held and governed from that point forward. The process of scanning is the moment when that decision gets made – either deliberately, with the right controls in place, or by default, in a way that creates exposure that may not be visible until something goes wrong.
The organisations that approach it deliberately are in a significantly stronger position. Not just from a compliance perspective, but operationally. Their document environments are easier to manage, easier to audit, and easier to defend. The governance that looked like overhead at the point of implementation turns out to be the thing that keeps them out of trouble when the pressure is on.
The ones that treated scanning as a purely operational exercise are managing a risk they may not have fully mapped; and in some cases, one they are not fully aware of yet.
Are you making your data easier to access – or easier to expose?
Dajon Data Management helps organisations digitise and govern their documents in a way that supports compliance from the point of capture. Get in touch to understand where your current scanning programme might be creating exposure you have not fully accounted for.
References
- ICO fines Capita for UK GDPR infringements following March 2023 data breach Clifford Chance[↩]
- Recent ICO Data Breach Enforcement Emphasizes the Importance of a Robust Breach Response Skadden[↩]
- ICO Enforcement in 2025: Record Fines and What They Mean Measured Collective[↩]
- Possibilities and limitations of unstructured data Research World[↩]
- Enforcement of this code ICO[↩]
- ICO fines Capita £14 million for data breaches Herbert Smith Freehills Kramer[↩]
- ICO fines Processor £3.07m for UK GDPR security failings CMS Law-Now[↩]
