Most organisations have spent the last few decades trying to get rid of information, not keep it. The logic was simple: if you are no longer required to retain the paperwork or the database, you might as well delete it. Once the legal retention period ended, compliance said you could safely get rid of it.
In a paper-based, pre-AI world, that mindset made sense. Storage was expensive, regulators were wary of hoarding, and the safest answer often seemed to be to keep the minimum and destroy the rest. The archive was a cost centre to be minimised.
In an AI driven world, that logic becomes a strategic liability. Historic information is no longer just a compliance burden. It is training data and context, the raw material that determines whether your AI understands your business or merely offers plausible sounding guesses.
This article looks at why the old “delete when you can” pattern is breaking down, what it means to make information AI ready, and how C level leaders can evolve retention and digitisation strategies without creating new compliance or cyber risks.
Why history is now a strategic asset
IDC forecasts that global data volumes will almost quadruple between 2023 and 2028, with enterprise data growing fastest. [1] Studies from McKinsey and others show that organisations which embed data in decision making are significantly more profitable and better at winning and keeping customers, while Splunk’s work on the “data divide” finds that firms that can integrate and reuse data at scale capture far more value from AI than those with fragmented datasets. [2] [3]
Modern AI systems, whether traditional machine learning, predictive models or large language models, are hungry. Their performance improves as you feed them more high quality, representative data. Studies in transport, smart city planning and long run economic and climate modelling show that models trained on long histories of real data vastly outperform those trained on short or sparse datasets. [4] [5]
The same logic applies inside your organisation. Maintenance logs, email trails, call recordings, scanned forms and “old” databases capture years of real behaviour. Clive Humby’s line that “data is the new oil” has been reinforced by OECD work treating data as a new form of capital for modern economies. [6] Each year of operation adds to that capital; deleting it early is like discarding hard won experience just when AI gives you new tools to exploit it.
Why “keep everything” is not the answer
If history is so valuable, why not keep it all? Because the risk landscape has also changed. Data protection laws, including GDPR, still expect organisations to minimise personal data, define clear purposes and avoid keeping information longer than necessary. Cyber security teams are rightly nervous about giant, uncontrolled data lakes full of sensitive information, and regulators are beginning to scrutinise how training data is collected and used.
The answer is not indiscriminate hoarding. The emerging pattern is deliberate preservation. You keep what matters, on purpose, under control.
That starts with identifying classes of information with genuine long term strategic value. These often include asset and process data, operational telemetry, non sensitive customer interactions, long run financial and pricing history, and knowledge assets such as procedures, designs and playbooks.
Next, you make it safer to hold. That can mean anonymisation or pseudonymisation for personal data, segregating highly sensitive records, tiered storage, and modern security controls and monitoring for archives that were previously treated as low risk repositories.
Finally, you make it usable. AI systems care about structure, consistency and context. An AI ready archive is digitised, searchable, well labelled and connected to your modern data platform, not trapped in a legacy application or an off site box store. Frameworks such as NIST’s AI Risk Management Framework underline that trustworthy AI rests on disciplined data management and governance, including over training data. [7]
What AI ready information looks like
For many boards, “getting AI ready” still sounds abstract. In practice, AI ready information tends to share four qualities. It is digital, structured, integrated and governed.
Digital means that key records are out of paper and microfilm and into systems where they can be processed. Modern document AI and intelligent capture tools can now extract structured data from large volumes of documents, emails and forms far more efficiently than in the past. [8]
Structured and findable means that key entities and relationships are machine readable and searchable. Metadata, labelling and reference data matter as much as the content itself, and recent guidance on AI data strategies repeatedly stresses high quality labelling and curation as foundations for effective training. [7] [9]
Integrated means that data is not trapped in isolated systems. Research on operational data suggests that a large share of what companies collect, sometimes quoted at over two thirds, simply goes unused because it sits in separate applications or schemas. [10] AI models need joined up views of customers, assets and processes, not a patchwork of partially overlapping datasets.
Governed means you know where sensitive attributes sit, who is allowed to use what, how data lineage works and how consent or purpose limitations are honoured. When governance is weak, risk officers will quite rightly block ambitious AI use cases.
The C level agenda and first steps
Research now explains why so many AI initiatives stall. Recent market studies suggest that many enterprise AI projects fail to deliver expected value, largely because of data, security and governance constraints rather than the models themselves.8 If the business has been deleting historic data as soon as compliance allowed, there is only so far an AI programme can go; you cannot train on records that no longer exist.
Shifting from “delete when you can” to “preserve what matters” is a leadership task. The CEO and board must treat information as strategic capital rather than back office clutter. The CFO can reframe archive and digitisation budgets as investments, weighing the cost of keeping more data against the upside from better pricing, fewer failures and improved retention.
CIOs and CDOs need to surface what information exists across physical and digital repositories, rationalise legacy applications without losing data, and build the integration and metadata layers that turn isolated datasets into a coherent pool. For COOs, this shift makes data a core part of the operating model, with processes designed so that high quality, well structured data is a natural by product of day to day work. Risk, Legal and Compliance leaders must set the rules and controls that make longer retention acceptable.
A practical starting point is discovery. Map what information you hold and how far back key datasets go. Revisit retention schedules so they consider long term analytical and AI value alongside legal minima. Prioritise high value digitisation, and invest in integration and metadata so teams can actually use what you have preserved.
A new pattern for keeping information
The traditional pattern was simple. Capture what you must. Store it as cheaply as possible. Destroy it as soon as you are allowed.
In an AI enabled organisation, that pattern is no longer enough. You still comply with retention law and respect privacy, but you recognise that historic information is a core asset. You make conscious choices about which data to preserve, you invest in digitising and integrating it, and you wrap it in strong governance so it is safe to use. The question for C level leaders is no longer “When can we delete this?” but “What future value might we be throwing away if we do?” In a world where AI performance depends on the depth and quality of the data behind it, the organisations that arrive with a rich, well prepared data pool will have the advantage.
References- IDC, “Worldwide IDC Global DataSphere Forecast” (various editions, 2019–2025), projecting that global data creation will grow from tens of zettabytes in 2019 to several hundred by the late 2020s. https://www.linkedin.com/posts/data-panacea_the-latest-idc-global-datasphere-forecast-activity-7370878620728250368-5odA[↩]
- McKinsey Global Institute, data-driven organisations study, as summarised in “5 stats that show how data-driven organizations outperform their competition”, Keboola blog, 2020. https://www.keboola.com/blog/5-stats-that-show-how-data-driven-organizations-outperform-their-competition[↩]
- Splunk, “From data divide to data dividend”, research report on the economic impact of data access and the widening gap between data-rich and data-poor organisations, 2021. https://www.splunk.com/en_us/pdfs/research-report/from-data-divide-to-data-dividend.pdf[↩]
- M. Shaygan et al. and P. Qi et al., recent reviews of deep-learning traffic forecasting methods, showing that models trained on large historic datasets deliver significantly improved accuracy in real-world scenarios. https://www.sciencedirect.com/science/article/abs/pii/S0968090X22003345[↩]
- OECD, “Global long-run economic scenarios: 2025 update” and OECD Well-being Data Monitor, illustrating how long-run data series underpin robust climate and economic projections. https://www.oecd.org/en/publications/oecd-global-long-run-economic-scenarios_00353678-en/full-report/component-4.html[↩]
- OECD, “Enhancing access to and sharing of data” (2019), describing data as a new form of capital, and Corrado et al., “Measuring data as an asset” (2022), framing data within an intangible capital perspective. https://www.oecd.org/en/publications/enhancing-access-to-and-sharing-of-data_276aaca8-en/full-report/economic-and-social-benefits-of-data-access-and-sharing_836734cb.html[↩]
- NIST, “AI Risk Management Framework 1.0” (2023), together with guidance such as Couchbase’s “A guide to AI data management” (2025) and Zendata’s “Data strategy for AI systems 101” (2024), all emphasising labelling, curation, metadata and governance as foundations for trustworthy AI. https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf[↩][↩]
- Data Strategy for AI Systems 101: Curating and Managing Data https://www.zendata.dev/post/data-strategy-for-ai-systems-101-curating-and-managing-data[↩]
- Google Cloud, Document AI product documentation and training guidance, showing how modern document-AI services can extract structured data from unstructured documents at scale. https://cloud.google.com/document-ai[↩]
- Omdia, “Market Landscape: Why 90% of enterprise AI projects will fail” (2025), Actian, “The governance gap: Why 60% of AI initiatives fail” (2025), and recent reporting summarising Gartner forecasts of high failure rates for AI projects due to data quality and governance constraints. https://omdia.tech.informa.com/om138097/market-landscape-why-90-of-enterprise-ai-projects-will-fail[↩]
