Why Organizations Still Struggle to Understand Their Data

The Data Problem Everyone is Still Talking About
August through October is conference season for the FiT team (and many of you). It recently wrapped up with ARMA InfoCon in Phoenix. Across convention halls and breakout rooms this year, we heard a recurring theme: organizations are still struggling to understand their data, and that confusion is becoming increasingly costly.
Every conversation, regardless of the person’s role, circled back to the same concern. Firms know they’re sitting on mountains of information. It lives in cloud storage that is expanding at a pace no one anticipated. Unstructured repositories have become the largest and most unwieldy part of their information ecosystem. And to top it off, their retention practices are either nonexistent or unenforced. They know all of this, but they don’t know what to do about it.
This isn’t a new problem, of course. The industry has been discussing information sprawl, ROT, and lifecycle management for years. But we noticed more urgency in the conversations the last few months. Organizations are no longer just overwhelmed; they’re feeling the direct operational, financial, and compliance pressure of not having clear, governed, actionable data.
The felt urgency is bringing with it a growing appetite for solutions. After years of simply acknowledging the problem, firms are starting to look for technology and governance models that can actually help them take control. The appetite for defensible retention, smarter classification, and intelligent automation has moved from theoretical to actual.
This blog explores the root causes behind the data-understanding crisis, the consequences of inaction, and the practical steps organizations can take to finally move forward.
The Root of the Problem: Misunderstood Data
For all the talk about digital transformation, AI readiness, and information governance maturity, the most persistent challenge remains the most fundamental: organizations do not truly understand their data. At least not at the level required to make informed, defensible, and cost-effective decisions about what to retain, what to delete, what to archive, and how to govern content throughout its lifecycle.
This isn’t for lack of trying. Many firms have invested in technology, consultants, and policy frameworks. But firms are lacking clarity, and without clarity on the nature, location, and value of their data, even the most sophisticated governance programs fail to be helpful.
What “knowing your data” actually means
Most organizations think they know their data because they know their systems. They can list the repositories they use, who owns them, and what categories of content they’re supposed to hold. But that isn’t the same as understanding the data itself.
To “know your data,” an organization must have clarity on:
- The content of the files
- The context in which information was created
- The business purpose it serves
- The risk level it represents
- The sensitivity or regulatory obligations associated with it
- The required lifecycle (how long it must be kept and when it should be destroyed)
- The accuracy, versioning, and duplication across systems
Unfortunately, few organizations can answer these questions consistently (or at all). Instead, they rely on system-level assumptions, inherited practices, and tribal knowledge. When staff leave, that context disappears. When departments create their own tools or workarounds, any existing structure fractures even further. The result is a predictable one: organizations have more data than ever but understand it less than ever.
Fragmented repositories and shadow IT
One of the strongest through-lines we’ve heard across conversations is frustration with fragmentation. Modern firms operate in ecosystems that grow more complex every year:
- Shared drives and legacy file servers
- Email archives
- Collaboration platforms like Teams, Slack, and Google Workspace
- Cloud storage apps, including some nobody realized existed (Box, OneDrive, Dropbox, SharePoint)
- Line-of-business systems that create or store documents
- Department-specific tools adopted without IT oversight
- Personal drives and “just for now” folders that become permanent
The sprawl from this ecosystem creates operational inconvenience and a governance nightmare. Every new repository becomes another environment where content can go unclassified, unreviewed, and unretained.
Shadow IT is one of the most underestimated contributors. Teams adopt tools to “move faster,” unaware that each new app creates a silo of unmanaged information. IT departments often discover these systems only when a legal hold or audit requires content that no one knows how to extract. Left unchecked, this conglomeration of disconnected systems guarantees that no one can ever say with confidence what the organization actually has or where it lives.
Unstructured repositories are the biggest blind spot
While structured systems (CRMs, case management tools, HRIS platforms) are relatively predictable, unstructured repositories remain the single largest and most problematic category of enterprise information. They’re also growing faster than anyone can keep up with.
Industry research repeatedly estimates that 70–90% of enterprise data is unstructured, such as documents, spreadsheets, PDFs, presentations, chat logs, videos, images, and more. Time and again we heard leaders tell us they’re drowning in unstructured content. Meanwhile, few organizations have a systematic way to understand or govern it.
Unstructured data poses unique challenges:
- It lacks consistent metadata
- It has no inherent structure
- It’s often duplicative
- It’s stored in places designed for convenience, not governance
- It’s impossible to manage manually at scale
It’s the category where the highest-risk content hides - Personally Identifiable Information(PII), sensitive financial or legal information, outdated contracts, orphaned records - yet it’s the category most organizations delay addressing. Unstructured repositories have become too big to ignore, but also too big for most organizations to tackle without support. Organizations need solutions now to address visibility and retention problems.
Practical Consequences of Unstructured Data
When problems feel too large to tackle, many organizations default to talking about them as an abstract inconvenience or a future problem. When it comes to data, however, “we don’t know what we have” is not an abstract problem. The consequences are showing up in real time as increased costs, legal exposure, operational inefficiency, and mounting pressure to implement a system that works…by yesterday.
Cloud storage costs continue to rise
For years, cloud migration was pitched as a cost-saving move. And in many cases, it was — at first. But now that cloud services charge based on usage, storage volume, and access frequency, organizations are seeing those costs climb at an alarming rate.
Across multiple conversations at ARMA, firms reported that:
- Their cloud storage bills have doubled in the past few years
- Growth in content volume continues outpacing reductions
- Storage models reward accumulation, not hygiene
- IT leaders struggle to justify ever-expanding budgets
There’s also an increasing danger that huge swaths of what firms are paying to store delivers little or no business value. ROT — redundant, outdated, or trivial information — builds up, sits untouched, and then continuously drives costs upward. Without lifecycle controls, retention enforcement, or clear understanding of what content actually matters, firms end up funding the preservation of their own clutter.
No retention, no defensibility
Another recurring theme at ARMA was how many organizations either:
- Do not have a functioning retention schedule
- Have one but don’t enforce it
- Have one that employees don’t understand
- Have one that exists only as a PDF no one has looked at in years
This lack of retention maturity creates three immediate problems.
- Over-retention becomes the default. With no guidance or automation, employees keep everything “just in case.” Departments hold onto legacy content they no longer need, and systems accumulate years’ worth of outdated files simply because nothing is prompting disposition.
- Under-retention creates compliance gaps. On the opposite end, some records disappear too soon because no one is monitoring or applying rules consistently. This creates exposure during audits, regulatory inquiries, or industry-specific retention mandates.
- Defensibility breaks down. Without a clear, consistently applied retention schedule, organizations cannot demonstrate why certain records were destroyed while others were kept, how policies were applied, or whether disposition decisions were compliant.
Courts and regulators expect intentionality and documentation around retention and disposition. Not having it incites direct legal risk.
eDiscovery delays and legal exposure
When organizations don’t understand their data, litigation or investigation becomes exponentially more expensive. Without classification, firms are forced into over-collection because they can’t confidently determine what is relevant, leading to predictable outcomes:
- Massive over-collection to avoid missing anything
- Increased review costs
- Longer timelines
- Higher risk of disclosing privileged or sensitive content
- Difficulty identifying authoritative versions of documents
Many organizations routinely “collect everything” because the alternative feels too risky. Unfortunately, collecting everything is actually increasing their risk.
The absence of structure and visibility also makes legal holds more burdensome. If no one knows which systems contain responsive content, the legal team ends up issuing broad, overly cautious holds that disrupt workflows and slow down business operations.
Business inefficiency and employee frustration
The consequences aren’t limited to legal or compliance teams. Poor data understanding slows the entire organization. Employees waste time searching for documents, comparing versions, or recreating work they couldn’t find. Cross-functional teams struggle to collaborate because no one knows which repository is the “official” one. Quality of final work suffers when there’s no clarity around where information should live or how it should be maintained. At scale, this becomes a silent productivity drain as well as an unnecessary operational expense.
Why Retention Programs Fail Before They Even Begin
Here’s what we’ve realized: most retention programs are failing before they begin. Not because the policies are flawed or the technology is inadequate, though sometimes that’s the case, but because the foundational conditions for success simply aren’t in place.
Most firms have approached document retention and destruction at least once, though to varying degrees. Many have a version of a schedule within their written policies. Some have even piloted tools, but tracking and proper follow up are lacking. Consequently, the majority have struggled to achieve consistent, organization-wide adoption. Understanding why retention initiatives stall is essential, because these hurdles are both predictable and entirely solvable.
Retention is still seen as “compliance homework”
One of the most persistent barriers to retention maturity is cultural. Employees continue to perceive records management as a compliance exercise and not connected to their day-to-day responsibilities. Which means retention policies remain abstract, optional, or ignored entirely.
Organizations often fall into one of two extremes: Employees over-retain because they’re afraid of deleting something important, or employees under-retain because they assume IT or Legal is handling it. Neither attitude supports a healthy information lifecycle.
Without clear communication about why retention matters — cost, risk, efficiency, consistency — policies cannot take root. When retention is framed as “a compliance rule,” it becomes a burden. When it’s framed as “a way to protect the organization and make work easier,” behavior begins to shift.
Tools without strategy
Organizations often begin with the software rather than the strategy. They purchase systems with automated retention features, classification engines, or file analysis capabilities, assuming the technology will solve the cultural and structural challenges.
But tools only succeed when:
- Policies are clear.
- Governance roles are defined.
- Data is understood.
- Stakeholders are aligned.
- Automation supports, rather than replaces, human decision-making.
Several ARMA conversations revealed the same pattern: firms install sophisticated retention tools, only to realize they don’t have everything to support it in place. They may have the policy, but who’s in charge and how it’s supported aren’t mapped out leading to a lack of ownership and no follow through.
Lack of ownership
Lack of ownership doesn’t stem from laziness or lack of organizational will. Rather, retention creates a unique challenge in that it sits in the gap between functions. Legal claims ownership of policy, IT owns the systems, information governance owns the program, and business units own the contents. In practice, that means no one really owns it unless your organization is intentional in assigning ownership.
When building out your governance plan, consider answering the following questions.
- Who authorizes destruction?
- Who maintains the schedule?
- Who’s in charge of updating requirements?
- How do you ensure all stakeholders are on the same page?
- How do you make the plan work for every department that has to follow it?
- How do you enforce rules across systems?
An effective governance program requires careful planning and cross-functional adherence.
Internal change-management barriers
Even with clear policies and functional alignment, many retention initiatives stall because organizations underestimate the human factors involved. Governance often requires small but meaningful shifts in behavior, and employees resist anything that adds complexity or threatens established routines.
Typical challenges include:
- Fear of deleting “just in case” files
- Lack of training or awareness
- Distrust in automated disposition
- Overwhelming volumes of inherited legacy content
- Confusion around multiple versions of documents
- Workflows that rely on saving files locally or outside sanctioned systems
Employees shy away from processes that add friction. Effective adoption requires addressing areas that require increased effort and getting everyone on the same page.
Successful retention programs minimize the need for human intervention. They build automation into the system, enforce lifecycle rules quietly in the background, and communicate clearly so users understand the “why” without needing to manage the “how”.
The Turning Point: Why Appetite for Solutions Is Finally Growing
For years, organizations have recognized their growing data challenges, yet meaningful progress remained slow. We’ve started to notice a change. Rather than just acknowledging the problem, leaders are actively seeking tools, models, and strategies that can finally bring order to their information ecosystems. We see a few reasons why.
The post-2020 digital explosion
The Covid-19 pandemic changed a lot of things for a lot of people. In the governance world, the rapid transition to remote and hybrid work dramatically accelerated the creation of digital content that needed to be managed. What would once have been an in-person conversation or whiteboard session became a chat thread, recorded meeting, email summary, etc. This led to organizations multiplying their digital footprint at an intense pace. And now, years later, they are reckoning with the aftermath. Digital sprawl happened quickly, and firms are realizing they can’t wait for the problem to stabilize or disappear. This is the new normal.
Increasing regulatory pressure
With digital sprawl and increased data have come regulators and industry bodies whose expectations around information governance evolved with the changes. Privacy laws, retention mandates, and disclosure requirements have become more stringent, with higher penalties for non-compliance.
Examples of pressure points include:
- Conflicting requirements between privacy laws (delete sooner) and industry regulations (retain longer)
- Higher scrutiny during audits
- Stronger expectations around defensible disposition
- New standards around transparency and accountability
This has spurred many organizations to define consistent, repeatable retention practices.
Budget pressure
For many organizations, the tipping point wasn’t regulatory or cultural, but financial. As cloud storage costs continue to rise, leadership wants a clear understanding of what they’re paying for. Executives are increasingly asking questions that governance and IT teams struggled to answer just a few years ago:
- What percentage of our stored content is actually required?
- How much is duplicative?
- How much is ROT?
- What is the cost of retaining non-records?
- How do we know we’re storing the right things?
Storage has become a line item measured in millions of dollars, forcing conversations and accelerating interest in data hygiene, disposition, and automated controls.
AI is exposing every governance weakness
Perhaps the strongest catalyst for change — and the most frequently discussed topic at ARMA — is AI. Organizations want to use generative AI, semantic search, and predictive analytics. Departments are eager for efficiency gains while vendors are pushing new capabilities. But the moment firms begin exploring AI, they hit the same barrier: AI cannot deliver value on ungoverned, inconsistent, unclassified data.
AI models learn from whatever they’re given. If that includes outdated contracts, incorrect versions, sensitive information stored incorrectly, files with missing or inaccurate metadata, etc., then the AI will replicate and amplify those flaws.
The cultural shift towards governance as strategy
Beyond technology and cost, there is a growing recognition that well-governed information is a strategic asset. Leaders are increasingly framing governance as infrastructure — something that improves decision-making, reduces risk, accelerates workflows, and sets the stage for AI-enabled transformation.
A Path Forward
If the conversations at ARMA made anything clear, it’s that firms are already aware of this challenge, they need a practical, achievable solution. The good news is that the most effective governance programs don’t start with massive overhauls. They start by increasing clarity around what can realistically be controlled, then adding automation to simplify how it’s controlled.
The first step is gaining a reliable view of the existing landscape. Organizations need to know where their content lives, which repositories pose the greatest risk, and where high-value records are buried beneath years of accumulated files. Most firms don’t need a full enterprise-wide inventory on day one, but a prioritized map that shows where cleanup and governance are needed will have a big impact.
From there, automation becomes essential. Manual classification and human-driven retention decisions don’t scale—especially in the unstructured world where most of the volume lives. File analysis, metadata enrichment, and auto-classification tools can quickly reduce noise and surface content that actually needs attention. In other words, automation makes governance feasible.
With visibility and automation in place, retention can shift from an aspirational policy to an operational reality. The most successful programs remove decision-making from end users and push lifecycle rules directly into systems. When disposition happens reliably in the background, employees don’t feel burdened, and organizations regain control without relying on manual cleanup campaigns.
None of this works without shared ownership. Legal, IT, information governance, and the business units each hold part of the puzzle, and progress only happens when those groups align on goals, responsibilities, and communication. The firms seeing the strongest movement are those able to establish and maintain clear, cross-functional coordination.
Finally, organizations need to measure results. Storage reductions, fewer duplicates, quicker retrieval times, and improved eDiscovery performance all reinforce the value of governance. When leaders see concrete gains, it becomes easier to secure support for the next phase of cleanup and automation.
The path forward doesn’t require perfection. A clear view, the right technology, and a structure that allows governance to happen consistently will address all the current concerns. Firms that start with focused steps can make meaningful progress quickly, and set a foundation strong enough for whatever comes next, from AI initiatives to mergers and acquisitions.
Ready to move forward?
Organizations can no longer afford to treat their data landscape as an unsolvable problem. The growth of unstructured content, rising cloud costs, intensified regulatory expectations, and the push toward AI have all converged to create a defining moment. Firms that once managed to get by without clear retention or structured governance are now facing operational, financial, and compliance pressures that are impossible to ignore.
Firms that take steps now will see immediate benefits: lower storage costs, faster retrieval, fewer eDiscovery surprises, and a cleaner foundation for AI and long-term digital strategy. Those that delay will find the gap widening quickly.
But you don’t need to find momentum on your own. FiT’s team of experts can not only help you implement information governance software that works across a complex data landscape, they can help you audit current policies and develop stronger ones. If you’re looking for support managing your data, start by talking to our team. Book a demo today.
Modernize Your Document
Lifecycle with Bespoke Solutions!
Discover tailored tools to streamline and elevate your workflows.







