Compliance

November 21, 2025

Why Organizations Still Struggle to Understand Their Data

10 min to read

Table of contents

Heading 2

Subscribe to newsletter

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By subscribing you agree to with our Privacy Policy

The Data Problem Everyone is Still Talking About

August through October is conference season for the FiT team (and many of you). It recently wrapped up with ARMA InfoCon in Phoenix. Across convention halls and breakout rooms this year, we heard a recurring theme: organizations are still struggling to understand their data, and that confusion is becoming increasingly costly.

Every conversation, regardless of the person’s role, circled back to the same concern. Firms know they’re sitting on mountains of information. It lives in cloud storage that is expanding at a pace no one anticipated. Unstructured repositories have become the largest and most unwieldy part of their information ecosystem. And to top it off, their retention practices are either nonexistent or unenforced. They know all of this, but they don’t know what to do about it.

This isn’t a new problem, of course. The industry has been discussing information sprawl, ROT, and lifecycle management for years. But we noticed more urgency in the conversations the last few months. Organizations are no longer just overwhelmed; they’re feeling the direct operational, financial, and compliance pressure of not having clear, governed, actionable data.

The felt urgency is bringing with it a growing appetite for solutions. After years of simply acknowledging the problem, firms are starting to look for technology and governance models that can actually help them take control. The appetite for defensible retention, smarter classification, and intelligent automation has moved from theoretical to actual.

This blog explores the root causes behind the data-understanding crisis, the consequences of inaction, and the practical steps organizations can take to finally move forward.

The Root of the Problem: Misunderstood Data

For all the talk about digital transformation, AI readiness, and information governance maturity, the most persistent challenge remains the most fundamental: organizations do not truly understand their data. At least not at the level required to make informed, defensible, and cost-effective decisions about what to retain, what to delete, what to archive, and how to govern content throughout its lifecycle.

This isn’t for lack of trying. Many firms have invested in technology, consultants, and policy frameworks. But firms are lacking clarity, and without clarity on the nature, location, and value of their data, even the most sophisticated governance programs fail to be helpful.

What “knowing your data” actually means

Most organizations think they know their data because they know their systems. They can list the repositories they use, who owns them, and what categories of content they’re supposed to hold. But that isn’t the same as understanding the data itself.

To “know your data,” an organization must have clarity on:

The content of the files
The context in which information was created
The business purpose it serves
The risk level it represents
The sensitivity or regulatory obligations associated with it
The required lifecycle (how long it must be kept and when it should be destroyed)
The accuracy, versioning, and duplication across systems

Unfortunately, few organizations can answer these questions consistently (or at all). Instead, they rely on system-level assumptions, inherited practices, and tribal knowledge. When staff leave, that context disappears. When departments create their own tools or workarounds, any existing structure fractures even further. The result is a predictable one: organizations have more data than ever but understand it less than ever.

Fragmented repositories and shadow IT

One of the strongest through-lines we’ve heard across conversations is frustration with fragmentation. Modern firms operate in ecosystems that grow more complex every year:

Shared drives and legacy file servers
Email archives
Collaboration platforms like Teams, Slack, and Google Workspace
Cloud storage apps, including some nobody realized existed (Box, OneDrive, Dropbox, SharePoint)
Line-of-business systems that create or store documents
Department-specific tools adopted without IT oversight
Personal drives and “just for now” folders that become permanent

The sprawl from this ecosystem creates operational inconvenience and a governance nightmare. Every new repository becomes another environment where content can go unclassified, unreviewed, and unretained.

Shadow IT is one of the most underestimated contributors. Teams adopt tools to “move faster,” unaware that each new app creates a silo of unmanaged information. IT departments often discover these systems only when a legal hold or audit requires content that no one knows how to extract. Left unchecked, this conglomeration of disconnected systems guarantees that no one can ever say with confidence what the organization actually has or where it lives.

Unstructured repositories are the biggest blind spot

While structured systems (CRMs, case management tools, HRIS platforms) are relatively predictable, unstructured repositories remain the single largest and most problematic category of enterprise information. They’re also growing faster than anyone can keep up with.

Industry research repeatedly estimates that 70–90% of enterprise data is unstructured, such as documents, spreadsheets, PDFs, presentations, chat logs, videos, images, and more. Time and again we heard leaders tell us they’re drowning in unstructured content. Meanwhile, few organizations have a systematic way to understand or govern it.

Unstructured data poses unique challenges:

It lacks consistent metadata
It has no inherent structure
It’s often duplicative
It’s stored in places designed for convenience, not governance
It’s impossible to manage manually at scale

It’s the category where the highest-risk content hides - Personally Identifiable Information(PII), sensitive financial or legal information, outdated contracts, orphaned records - yet it’s the category most organizations delay addressing. Unstructured repositories have become too big to ignore, but also too big for most organizations to tackle without support. Organizations need solutions now to address visibility and retention problems.

Practical Consequences of Unstructured Data

When problems feel too large to tackle, many organizations default to talking about them as an abstract inconvenience or a future problem. When it comes to data, however, “we don’t know what we have” is not an abstract problem. The consequences are showing up in real time as increased costs, legal exposure, operational inefficiency, and mounting pressure to implement a system that works…by yesterday.

Cloud storage costs continue to rise

For years, cloud migration was pitched as a cost-saving move. And in many cases, it was — at first. But now that cloud services charge based on usage, storage volume, and access frequency, organizations are seeing those costs climb at an alarming rate.

Across multiple conversations at ARMA, firms reported that:

Their cloud storage bills have doubled in the past few years
Growth in content volume continues outpacing reductions
Storage models reward accumulation, not hygiene
IT leaders struggle to justify ever-expanding budgets

There’s also an increasing danger that huge swaths of what firms are paying to store delivers little or no business value. ROT — redundant, outdated, or trivial information — builds up, sits untouched, and then continuously drives costs upward. Without lifecycle controls, retention enforcement, or clear understanding of what content actually matters, firms end up funding the preservation of their own clutter.

No retention, no defensibility

Another recurring theme at ARMA was how many organizations either:

Do not have a functioning retention schedule
Have one but don’t enforce it
Have one that employees don’t understand
Have one that exists only as a PDF no one has looked at in years

This lack of retention maturity creates three immediate problems.

Over-retention becomes the default. With no guidance or automation, employees keep everything “just in case.” Departments hold onto legacy content they no longer need, and systems accumulate years’ worth of outdated files simply because nothing is prompting disposition.
Under-retention creates compliance gaps. On the opposite end, some records disappear too soon because no one is monitoring or applying rules consistently. This creates exposure during audits, regulatory inquiries, or industry-specific retention mandates.
Defensibility breaks down. Without a clear, consistently applied retention schedule, organizations cannot demonstrate why certain records were destroyed while others were kept, how policies were applied, or whether disposition decisions were compliant.

Courts and regulators expect intentionality and documentation around retention and disposition. Not having it incites direct legal risk.

eDiscovery delays and legal exposure

When organizations don’t understand their data, litigation or investigation becomes exponentially more expensive. Without classification, firms are forced into over-collection because they can’t confidently determine what is relevant, leading to predictable outcomes:

Massive over-collection to avoid missing anything
Increased review costs
Longer timelines
Higher risk of disclosing privileged or sensitive content
Difficulty identifying authoritative versions of documents

Many organizations routinely “collect everything” because the alternative feels too risky. Unfortunately, collecting everything is actually increasing their risk.

The absence of structure and visibility also makes legal holds more burdensome. If no one knows which systems contain responsive content, the legal team ends up issuing broad, overly cautious holds that disrupt workflows and slow down business operations.

Business inefficiency and employee frustration

The consequences aren’t limited to legal or compliance teams. Poor data understanding slows the entire organization. Employees waste time searching for documents, comparing versions, or recreating work they couldn’t find. Cross-functional teams struggle to collaborate because no one knows which repository is the “official” one. Quality of final work suffers when there’s no clarity around where information should live or how it should be maintained. At scale, this becomes a silent productivity drain as well as an unnecessary operational expense.

Why Retention Programs Fail Before They Even Begin

Here’s what we’ve realized: most retention programs are failing before they begin. Not because the policies are flawed or the technology is inadequate, though sometimes that’s the case, but because the foundational conditions for success simply aren’t in place.

Most firms have approached document retention and destruction at least once, though to varying degrees. Many have a version of a schedule within their written policies. Some have even piloted tools, but tracking and proper follow up are lacking. Consequently, the majority have struggled to achieve consistent, organization-wide adoption. Understanding why retention initiatives stall is essential, because these hurdles are both predictable and entirely solvable.

Retention is still seen as “compliance homework”

One of the most persistent barriers to retention maturity is cultural. Employees continue to perceive records management as a compliance exercise and not connected to their day-to-day responsibilities. Which means retention policies remain abstract, optional, or ignored entirely.

Organizations often fall into one of two extremes: Employees over-retain because they’re afraid of deleting something important, or employees under-retain because they assume IT or Legal is handling it. Neither attitude supports a healthy information lifecycle.

Without clear communication about why retention matters — cost, risk, efficiency, consistency — policies cannot take root. When retention is framed as “a compliance rule,” it becomes a burden. When it’s framed as “a way to protect the organization and make work easier,” behavior begins to shift.

Tools without strategy

Organizations often begin with the software rather than the strategy. They purchase systems with automated retention features, classification engines, or file analysis capabilities, assuming the technology will solve the cultural and structural challenges.

But tools only succeed when:

Policies are clear.
Governance roles are defined.
Data is understood.
Stakeholders are aligned.
Automation supports, rather than replaces, human decision-making.

Several ARMA conversations revealed the same pattern: firms install sophisticated retention tools, only to realize they don’t have everything to support it in place. They may have the policy, but who’s in charge and how it’s supported aren’t mapped out leading to a lack of ownership and no follow through.

Lack of ownership

Lack of ownership doesn’t stem from laziness or lack of organizational will. Rather, retention creates a unique challenge in that it sits in the gap between functions. Legal claims ownership of policy, IT owns the systems, information governance owns the program, and business units own the contents. In practice, that means no one really owns it unless your organization is intentional in assigning ownership.

When building out your governance plan, consider answering the following questions.

Who authorizes destruction?
Who maintains the schedule?
Who’s in charge of updating requirements?
How do you ensure all stakeholders are on the same page?
How do you make the plan work for every department that has to follow it?
How do you enforce rules across systems?

An effective governance program requires careful planning and cross-functional adherence.

Internal change-management barriers

Even with clear policies and functional alignment, many retention initiatives stall because organizations underestimate the human factors involved. Governance often requires small but meaningful shifts in behavior, and employees resist anything that adds complexity or threatens established routines.

Typical challenges include:

Fear of deleting “just in case” files
Lack of training or awareness
Distrust in automated disposition
Overwhelming volumes of inherited legacy content
Confusion around multiple versions of documents
Workflows that rely on saving files locally or outside sanctioned systems

Employees shy away from processes that add friction. Effective adoption requires addressing areas that require increased effort and getting everyone on the same page.

Successful retention programs minimize the need for human intervention. They build automation into the system, enforce lifecycle rules quietly in the background, and communicate clearly so users understand the “why” without needing to manage the “how”.

The Turning Point: Why Appetite for Solutions Is Finally Growing

For years, organizations have recognized their growing data challenges, yet meaningful progress remained slow. We’ve started to notice a change. Rather than just acknowledging the problem, leaders are actively seeking tools, models, and strategies that can finally bring order to their information ecosystems. We see a few reasons why.

The post-2020 digital explosion

The Covid-19 pandemic changed a lot of things for a lot of people. In the governance world, the rapid transition to remote and hybrid work dramatically accelerated the creation of digital content that needed to be managed. What would once have been an in-person conversation or whiteboard session became a chat thread, recorded meeting, email summary, etc. This led to organizations multiplying their digital footprint at an intense pace. And now, years later, they are reckoning with the aftermath. Digital sprawl happened quickly, and firms are realizing they can’t wait for the problem to stabilize or disappear. This is the new normal.

Increasing regulatory pressure

With digital sprawl and increased data have come regulators and industry bodies whose expectations around information governance evolved with the changes. Privacy laws, retention mandates, and disclosure requirements have become more stringent, with higher penalties for non-compliance.

Examples of pressure points include:

Conflicting requirements between privacy laws (delete sooner) and industry regulations (retain longer)
Higher scrutiny during audits
Stronger expectations around defensible disposition
New standards around transparency and accountability

This has spurred many organizations to define consistent, repeatable retention practices.

Budget pressure

For many organizations, the tipping point wasn’t regulatory or cultural, but financial. As cloud storage costs continue to rise, leadership wants a clear understanding of what they’re paying for. Executives are increasingly asking questions that governance and IT teams struggled to answer just a few years ago:

What percentage of our stored content is actually required?
How much is duplicative?
How much is ROT?
What is the cost of retaining non-records?
How do we know we’re storing the right things?

Storage has become a line item measured in millions of dollars, forcing conversations and accelerating interest in data hygiene, disposition, and automated controls.

AI is exposing every governance weakness

Perhaps the strongest catalyst for change — and the most frequently discussed topic at ARMA — is AI. Organizations want to use generative AI, semantic search, and predictive analytics. Departments are eager for efficiency gains while vendors are pushing new capabilities. But the moment firms begin exploring AI, they hit the same barrier: AI cannot deliver value on ungoverned, inconsistent, unclassified data.

AI models learn from whatever they’re given. If that includes outdated contracts, incorrect versions, sensitive information stored incorrectly, files with missing or inaccurate metadata, etc., then the AI will replicate and amplify those flaws.

The cultural shift towards governance as strategy

Beyond technology and cost, there is a growing recognition that well-governed information is a strategic asset. Leaders are increasingly framing governance as infrastructure — something that improves decision-making, reduces risk, accelerates workflows, and sets the stage for AI-enabled transformation.

A Path Forward

If the conversations at ARMA made anything clear, it’s that firms are already aware of this challenge, they need a practical, achievable solution. The good news is that the most effective governance programs don’t start with massive overhauls. They start by increasing clarity around what can realistically be controlled, then adding automation to simplify how it’s controlled.

The first step is gaining a reliable view of the existing landscape. Organizations need to know where their content lives, which repositories pose the greatest risk, and where high-value records are buried beneath years of accumulated files. Most firms don’t need a full enterprise-wide inventory on day one, but a prioritized map that shows where cleanup and governance are needed will have a big impact.

From there, automation becomes essential. Manual classification and human-driven retention decisions don’t scale—especially in the unstructured world where most of the volume lives. File analysis, metadata enrichment, and auto-classification tools can quickly reduce noise and surface content that actually needs attention. In other words, automation makes governance feasible.

With visibility and automation in place, retention can shift from an aspirational policy to an operational reality. The most successful programs remove decision-making from end users and push lifecycle rules directly into systems. When disposition happens reliably in the background, employees don’t feel burdened, and organizations regain control without relying on manual cleanup campaigns.

None of this works without shared ownership. Legal, IT, information governance, and the business units each hold part of the puzzle, and progress only happens when those groups align on goals, responsibilities, and communication. The firms seeing the strongest movement are those able to establish and maintain clear, cross-functional coordination.

Finally, organizations need to measure results. Storage reductions, fewer duplicates, quicker retrieval times, and improved eDiscovery performance all reinforce the value of governance. When leaders see concrete gains, it becomes easier to secure support for the next phase of cleanup and automation.

The path forward doesn’t require perfection. A clear view, the right technology, and a structure that allows governance to happen consistently will address all the current concerns. Firms that start with focused steps can make meaningful progress quickly, and set a foundation strong enough for whatever comes next, from AI initiatives to mergers and acquisitions.

Ready to move forward?

Organizations can no longer afford to treat their data landscape as an unsolvable problem. The growth of unstructured content, rising cloud costs, intensified regulatory expectations, and the push toward AI have all converged to create a defining moment. Firms that once managed to get by without clear retention or structured governance are now facing operational, financial, and compliance pressures that are impossible to ignore.

Firms that take steps now will see immediate benefits: lower storage costs, faster retrieval, fewer eDiscovery surprises, and a cleaner foundation for AI and long-term digital strategy. Those that delay will find the gap widening quickly.

But you don’t need to find momentum on your own. FiT’s team of experts can not only help you implement information governance software that works across a complex data landscape, they can help you audit current policies and develop stronger ones. If you’re looking for support managing your data, start by talking to our team. Book a demo today.

‍

Book a Demo

Table of contents

Heading 2

Subscribe to newsletter

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

By subscribing you agree to with our Privacy Policy

Resource center

Insights from the FiT Blog

Business

Dark Data in Law Firms: The Hidden Risk and the Information Governance Fix

February 17, 2026

Why Organizations Still Struggle to Understand Their Data

The Data Problem Everyone is Still Talking About

The Root of the Problem: Misunderstood Data

What “knowing your data” actually means

Fragmented repositories and shadow IT

Unstructured repositories are the biggest blind spot

Practical Consequences of Unstructured Data

Cloud storage costs continue to rise

No retention, no defensibility

eDiscovery delays and legal exposure

Business inefficiency and employee frustration

Why Retention Programs Fail Before They Even Begin

Retention is still seen as “compliance homework”

Tools without strategy

Lack of ownership

Internal change-management barriers

The Turning Point: Why Appetite for Solutions Is Finally Growing

The post-2020 digital explosion

Increasing regulatory pressure

Budget pressure

AI is exposing every governance weakness

The cultural shift towards governance as strategy

A Path Forward

Ready to move forward?

Book a Demo

Insights from the FiT Blog

Dark Data in Law Firms: The Hidden Risk and the Information Governance Fix

Box Retention Challenges: Managing Complex Retention Rules

Future in Tech Promotes Jeff Vaccaro to Vice President of Software Engineering

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Heading

Modernize Your Document Lifecycle with Bespoke Solutions!

Modernize Your Document
Lifecycle with Bespoke Solutions!