Published on 2024年9月24日
This blog examines Gartner’s report “Quick Answer: What Makes Data AI-Ready?” and argues that ensuring data readiness upfront is the single most important factor for AI success. By aligning data with use cases, qualifying it to meet AI requirements, and implementing governance that accounts for AI’s unique challenges, organizations can not only avoid costly failures but also accelerate their competitive advantage. Business leaders must act now or risk being left behind.
Did you know that over 70% of organizations have implemented AI in at least one business function?1 Despite this, only 11% of organizations have implemented AI at scale.2
How has your organization fared? Whether you work in financial services, insurance, manufacturing, healthcare, pharmaceuticals, the government, or any other industry, the difference between success and failure in AI initiatives is not just about having the right tools or talent—it’s about whether your data is truly AI-ready. While many business leaders focus on adopting AI technologies, the reality is that without properly prepared data, even the most sophisticated AI systems will fall short of delivering meaningful results.
Some may argue that data readiness is a technical issue that can be handled later in the AI development process, but this is a dangerous misconception. AI-ready data is not just about cleanliness or structure; it’s about aligning data to the specific demands of each AI use case. Ignoring this principle leads directly to project delays, inaccurate insights, and biased models that ultimately undermine the business value of AI investments.
It’s not enough to assume that traditional data standards will work for AI initiatives. Data readiness is fundamentally different for AI. Many leaders believe that cleaning and organizing data, as they’ve done for other projects, will suffice.
According to Gartner,
“AI-ready data means that your data must be representative of the use case, of every pattern, errors, outliers and unexpected emergence that is needed to train or run the AI model for the specific use. Data readiness for AI is not something you can build once and for all nor that you can build ahead of time for all your data. It is a process and a practice based on availability of metadata to align, qualify and govern the data.”3
Without these elements, AI systems will deliver suboptimal results, or worse, fail to operate as expected.
Some argue that AI models are capable of handling messy data and can “learn” from whatever input they’re given. However, this approach is risky and often leads to biased outcomes, as AI models will underperform without the representative diversity of data inputs. The difference between well-prepared data and inadequately aligned data is the difference between reliable, actionable insights and costly errors that can erode trust in AI solutions.
In short, AI-ready data is not about cleaning—it’s about ensuring data is comprehensive and representative of the use case. This level of preparation is essential to driving the real value of AI within an organization.
AI-ready data is often misinterpreted as simply having enough data. This is a dangerous simplification. To achieve success, data must be contextually aligned with the specific AI/ML models being deployed. For example, generative AI models need vast amounts of unstructured data, while predictive models thrive on structured and time-series data.
Some organizations may mistakenly ignore data quality and believe that data alignment is something that can be adjusted later as AI/ML models mature. But this “fix it later” approach can derail projects and cause models to fail miserably in the real-world. Misaligned data introduces bias and inaccuracies that will take significant time and effort to correct downstream. AI models are only as good as the data fed into them—starting with misaligned data means setting up the project for underperformance from the start.
The misconception that more data equals better AI performance is widespread but misguided. Just look at ChatGPT, which was trained using data indiscriminately vacuumed up from the internet (more than 570GB of data), and it still hallucinates around 30% of the time (89% for legal tasks)4, 5.
It’s not about quantity—it’s about quality and qualification. Gartner states, “AI training requires representative data including errors, outliers and unexpected but valid data. D&A leaders may be tempted to say AI-ready data is high-quality data, but high-quality data as judged by traditional DQ standards does not equate to AI-ready data. When thinking about data in the context of analytics, for example, it is expected to remove the outliers or cleanse the data to support the expectations of the humans. Yet, when training an algorithm, the algorithm will need representative data. This may include poor-quality data too.”6
However, what they do not point out is that in many cases (like fraud and manufacturing excursions) outliers and anomalies are the very thing you are trying to detect! Thus, quality depends on the use case. To meet AI requirements, data must be semantically meaningful, labeled correctly, and sourced from trusted origins. Companies that fail to adequately qualify their data risk introducing biases and inaccuracies that can derail their AI efforts.
Opponents might claim that AI models will “self-correct” over time, learning from whatever data they are provided. However, AI systems cannot correct biases embedded in the data they are trained on. If the data is flawed or unqualified, the model’s outputs will reflect those flaws, often exacerbating biases or producing faulty insights.
Thus, qualifying data to meet AI’s needs is non-negotiable. Business leaders who cut corners in this area do so at the risk of building biased or ineffective AI models, resulting in wasted investments.
Data governance is often seen as a bureaucratic hurdle, but without strong governance, AI data management will collapse under its own complexity. AI-ready data requires constant monitoring, ethical oversight, and regulatory compliance to ensure models are performing reliably and transparently – and this demands a robust governance framework.
Some may argue that top-down governance slows innovation and agility; it may, which is why we advocate for a people-first approach to governance. Yet, without governance, AI systems risk producing untrustworthy results or even violating privacy and compliance laws, which can lead to significant financial and reputational damage. Strong governance ensures that data is being used ethically and effectively and that AI models are transparent and explainable (or at least as much as they can be; can a model with billions of parameters ever be truly transparent or explainable?)
In today’s environment of increasing AI regulation and scrutiny, businesses that fail to implement proper governance may find themselves outpaced by competitors who prioritize transparency and compliance.
Many organizations continue to rely on traditional data management practices that are ill-suited for AI. This misalignment is a critical barrier to AI success. While traditional methods focus on structured, clean data, AI thrives on data that reflects real-world complexity, including anomalies and outliers. Businesses that don’t recognize this difference will face project delays and performance issues.
Opponents might argue that transitioning to AI-ready data practices is expensive and time-consuming, but the cost of inaction is far greater. Projects that rely on poorly managed data often face costly rework, and the financial impact of failed AI initiatives can be catastrophic.
Many business leaders underestimate the importance of metadata in AI readiness. Without robust metadata, AI models lack the context needed to produce meaningful insights, and governance becomes nearly impossible. Companies that neglect metadata management may find that their AI initiatives lack transparency, making it difficult to trust and scale AI solutions.
AI use cases are not static, and failing to adapt data practices to these continuous iterations will render AI models obsolete. Businesses that don’t build flexibility into their data processes risk falling behind in AI innovation. The argument that data can be “fixed later” neglects the reality that AI models degrade quickly if not constantly fed updated, relevant data.
How can data leaders rise to meet the new demands of building AI? Here are three key ways.
To keep pace with the demands of AI, data teams must move away from rigid, static processes. A dynamic, iterative approach is the only way to ensure data readiness in the fast-changing world of AI. Some may argue that this approach introduces uncertainty and complexity, but in reality, it allows organizations to be more agile and responsive, positioning them to capitalize on AI advancements.
Governance in AI is not optional. Ignoring governance risks significant compliance and ethical failures, which can damage both the business and its reputation. Leaders must implement governance structures that are specifically tailored to AI use cases, ensuring transparency and accountability in every phase of AI model development.
Some businesses argue that scaling AI-ready data practices is too resource-intensive. However, scalability is not just a technical requirement—it’s a business imperative. Without scalable practices, AI initiatives will remain siloed and fail to generate the broader impact needed to drive enterprise-wide transformation. The investment in scaling AI-ready data now will pay dividends in the form of more reliable, widespread AI adoption across the organization.
The success of AI initiatives hinges on one thing: AI-ready data. Leaders who fail to grasp the importance of data readiness will face underperforming models, biased outcomes, and costly project delays. This is not a technical side issue—it’s a strategic business priority. The traditional approach to data management is insufficient in the age of AI, and leaders must rethink how they align, qualify, and govern their data.
The question isn’t whether to prioritize AI-ready data; it’s how quickly can businesses adapt. Those who move swiftly to implement the necessary data practices will gain a competitive advantage, while those who delay will inevitably fall behind. Business leaders must take decisive action now to ensure that their data strategies are equipped to meet the demands of AI, or risk missing out on the full potential that AI can offer.
Grab a copy of the report today.
Singla, Alex, Alexander Sukharevsky, Lareina Yee, Michael Chui, and Bryce Hall. 2024. “The state of AI in early 2024: Gen AI adoption spikes and starts to generate value.” McKinsey & Company. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai#/.
Baig, Aamer, Douglas Merrill, Megha Sinha, Danesha Mead, and Stephen Xu. 2024. “How CIOs can scale gen AI.” McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/moving-past-gen-ais-honeymoon-phase-seven-hard-truths-for-cios-to-get-from-pilot-to-scale.
GartnerⓇ, Quick Answer: What Makes Data AI-Ready?, 15 May 2024, By Roxane Edjlali, Mark Beyer, Svetlana Sicular, Ehtisham Zaidi. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s Research & Advisory organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Metz, Cade. 2023. “Chatbots May 'Hallucinate' More Often Than Many Realize.” The New York Times, November 16, 2023. https://www.nytimes.com/2023/11/06/technology/chatbots-hallucination-rates.html.
Dahl, Matthew, Varun Magesh, Mirac Suzgun, and Daniel E. Ho. 2024. “[2401.01301] Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models.” arXiv. https://arxiv.org/abs/2401.01301.
Edjlali, Roxane, Mark Beyer, Svetlana Sicular, and Ehtisham Zaidi. 24. “Quick Answer: What Makes Data AI-Ready?” Gartner Inc. https://www.gartner.com/document-reader/document/5432763?ref=solrAll&refval=429217797.