By Vanda Martins
Published on January 14, 2025
As organizations increasingly embrace artificial intelligence (AI) to drive innovation and efficiency, the need for trusted data has never been more critical.
In the recent webinar "Building AI You Can Trust: Driving Business Value with Data Quality, Observability, and Governance," expert Fern Halper, author of Big Data for Dummies and VP and Sr. Research Director, TDWI, was joined by representatives from Databricks, Monte Carlo, and Alation, to share actionable insights on achieving trustworthy AI through robust data practices. What follows is a recap of the key takeaways from that webinar, which you can also watch below. Let’s dive in!
Fern Halper set the stage by emphasizing that trusted AI begins with trusted data. She outlined three critical pillars for building trusted AI models:
Data quality: Defined as the suitability of data for its intended purpose, data quality ensures accuracy, completeness, and timeliness—all essential for unbiased AI models. Yet poor data quality is a persistent enterprise challenge. Halper highlighted a revealing statistic: only 50% of organizations report satisfaction with their data quality, according to TDWI research.
Data monitoring and observability: Continuous monitoring and real-time anomaly detection are essential to maintain data health. As Halper noted, observability extends beyond dashboards to provide holistic insights into the health of data pipelines.
Data governance: Governance establishes the rules, policies, and accountability structures needed to maintain integrity and compliance. Halper emphasized that governance must evolve alongside the increasing complexity of data and regulations.
"AI starts with trusted data," Halper underscored. If the data going into the system is flawed, the outputs will be too – the classic garbage in, garbage out story.
“What’s most important is getting everyone working towards the same metrics” Bryce Heltzel, product leader at Monte Carlo, emphasized. This entails selecting the quality metrics most pertinent to a given use case, whether those metrics be completeness, consistency, or accuracy. To build alignment, the presenters also recommended:
Clear ownership: Assigning technical and business stewards to oversee datasets fosters accountability and accelerates issue resolution.
Cross-functional collaboration: Breaking down silos between data scientists, engineers, and business users ensures unified efforts toward data quality and governance.
Defined metrics: Metrics such as data completeness, accuracy, and policy compliance help track progress and build trust across teams.
Raja Perumal, Senior Alliances Manager at Databricks, emphasized that governing every layer of the data stack is critical for accurate, trusted AI. Clear ownership, cross-functional collaboration, and defined metrics are key pillars of AI governance.
Data governance throughout the AI model development process safeguards quality and ethics, increasing the likelihood of a model’s success. Steve Wooledge, VP of Product Marketing, Alation, underscored the importance of starting data governance at the data ingestion stage, arguing that it’s critical to classify sensitive data, apply policies, and ensure datasets are complete, unbiased, and accurate before model training begins.
Through each stage of the AI development process, “data quality and governance are really important,” Wooledge said. “So starting with data ingestion, the data scientist needs to be able to discover what data is available quickly, ensure that it has high quality, understand the usage rates of that data for that data preparation phase, where it's necessary to choose the right features or data columns that would be used within the model.” A robust technology stack is critical to this process.
“In Alation, you can easily understand both the quality of that data pulling that health information from tools like Monte Carlo as well as classifications, so decisions can be made quickly about how appropriate that data is to be used directly in the model,” Wooledge revealed. He highlighted a case study of a grocery delivery company that saved over $500,000 by proactively identifying data anomalies, showcasing the tangible ROI of trusted data.
Generative AI introduces new complexities in building trust. Heltzel pointed out that in 2025, we’ll be seeing more AI agents acting autonomously, making it imperative to monitor and trace data quality throughout their lifecycle. "Hallucinations can happen," he warned, making data quality and reliability paramount – as well as the ability to monitor outputs continuously and rectify errors when they do occur.
What’s the value of trusted data? Put differently, (and in the words of the webinar’s moderator): “What’s the best way to sell leadership on purchasing a data quality monitoring tool?”
“It comes back to the outcome you’re trying to deliver,” Heltzel counseled. “Can you point to specific scenarios that have happened in the past where data quality has cost the business?... What are the decisions being made on a regular basis… that do drive a lot of revenue or potential cost? And what would be the issue if data quality issues were to throw off those decisions being made?”
In many ways, data observability is insurance: As a business, you hope you don’t need it, but when you do, it saves you from disaster.
The webinar showcased a best-of-breed data stack integrating Alation, Monte Carlo, and Databricks. Each platform contributes unique strengths to build trust across the data-for-AI lifecycle:
Alation is the front door to data, fostering collaboration across technical and business teams and ensuring data consumers access trusted assets through centralized metadata and governance.
Databricks provides a unified platform for centralizing data storage, processing, and AI/analytics workloads, enabling seamless collaboration across data engineering, data science, and machine learning teams
Monte Carlo ensures data reliability through advanced observability, detecting and addressing anomalies across pipelines.
Wooledge emphasized that these tools, when used together, create a seamless ecosystem: Alation provides the governance and visibility layer, Monte Carlo tracks data health, and Databricks offers the storage and processing power to scale AI initiatives.
The group offered actionable advice for data leaders seeking to capitalize on the AI opportunity:
Adopt a holistic approach: Address data quality, observability, and governance together to ensure comprehensive oversight.
Embrace collaboration: Foster organizational alignment by uniting data producers and consumers around shared goals and metrics.
Stay proactive: Monitor data pipelines continuously to identify and resolve issues before they impact AI outputs.
Leverage automation: Use tools like Monte Carlo and Alation to scale data quality and governance efforts beyond manual processes.
As organizations integrate AI into critical decision-making processes, trust becomes the cornerstone of success. By investing in the right technologies and fostering a culture of accountability, businesses can unlock AI’s full potential while safeguarding their reputation and bottom line.
Ready to build AI you can trust? Start by booking a demo with us today.