By Gene Arnold
Published on June 4, 2024
Data quality just got easier! We’re pleased to welcome Snowflake Horizon to Alation’s Open Data Quality Initiative (ODQI). This joint solution integrates Snowflake’s Data Quality Metric Functions (DMFs) with our data quality framework, extending data quality capabilities into Alation. Now, joint users can take advantage of improved data governance, detect and address data quality issues faster, and harness the power of high-quality data for AI model development at scale.
In the age of AI, quality data is no longer a “nice to have” – it’s a necessity. In fact, Gartner estimates that poor-quality data costs organizations an average of $12.9 million per year. For organizations seeking to leverage AI, the potential damage of low-quality data grows exponentially. This deepened integration with Snowflake improves data quality to ensure AI-ready data. Now, joint users can leverage GenAI with the confidence of high-quality data while proactively detecting and addressing quality issues before they damage the business or data operations.
“Through our strengthened integration with Snowflake, we are revolutionizing data quality and the approach to AI readiness,” says Diby Malakar, Vice President of Product Management at Alation. “Poor data quality is a major obstacle in data and GenAI initiatives. With Alation and Snowflake, organizations can confidently leverage their data for crucial decisions and AI strategies while promoting a data-driven culture throughout their data estate.”
AI presents an alluring opportunity for innovative business leaders. In fact, according to Salesforce’s March 2024 study, 57% of IT leaders acknowledge that GenAI is a game changer and a means to better serve customers. Yet that opportunity is not without risk, as a larger 71% agree with the statement, “GenAI will introduce new security risks to our data.” This makes the need for data governance triply critical for those seeking to introduce GenAI to the business.
IT leaders agree that data governance and data quality are prerequisites for successful GenAI initiatives. When asked, “What would be required for an organization to use GenAI successfully?” 55% of IT leaders said, “accurate, complete, unified data” and 54% pointed to “enhanced security measures to protect the business from new cyber security threats” (Salesforce).
Our new Data Quality Processor (DQP) for Snowflake Horizon meets these demands for accurate, complete, unified data to fuel GenAI projects. Leaders can streamline both access and governance, setting the stage for AI innovation while resting assured that compliance and security are safeguarded.
The solution is easy to set up. Leaders will be pleased to learn that there are no extra virtual machines (VMs) or services to set up for governance, cross-cloud data sharing, and cross-cloud business continuity capabilities. The integration also boasts low overhead, as there are no manual upgrades or runtime compatibility issues.
Snowflake Horizon is a built-in governance solution. It unifies compliance, security, privacy, interoperability, and access capabilities to enable customers to easily govern data, apps, and more across complex data ecosystems. With Alation, Horizon is also a powerful discovery tool that preserves privacy.
With this integration, data scientists seeking data for AI projects can discover and investigate high-quality data on a privacy-protected platform. They can gain direct access to live, ready-to-use data across clouds and regions with no ETL. And they can analyze PII and other sensitive data types without exposing underlying data (using Snowflake Native Data Clean Rooms) and advanced privacy policies
This new solution meets dual demands to protect data while enabling secure access for those tasked with utilizing it. With Snowflake Horizon, millions of governance policies are assigned, tens of millions of columns are masked, and billions of queries are on protected objects.
The integration offers great breadth and depth for data scientists seeking to evaluate the many parameters of data quality. Users can take action quickly, evaluating and accessing data and apps within and without the organization. With more than 530 providers offering 2,300+ ready-to-use data and Snowflake Apps, joint users can explore a lion’s share of solutions. Finally, leaders can enjoy streamlined procurement, evaluating via a self-service trial on Snowflake Marketplace – and procure quickly should they choose.
Together, Snowflake and Alation deliver a unified view of data quality across the entire landscape. This joint solution eases the work of data engineers, stewards, users, and security admin with a platform that integrates data quality functions, policies, and health metrics. Data workers are empowered to find, understand, and trust data with a streamlined process that begins with compliance and ends with secure access. Here’s how it works.
Compliance. Before any data is touched, compliance protects and audits your data in the Data Cloud with business continuity, DQ monitoring and lineage on the ODQF.
Security. Continuous risk monitoring and protections secure data, while role-based access control (RBAC) sets the appropriate guardrails for usage based on role.
Privacy. Sensitive data is valuable, but it must be protected. With advanced privacy policies, masking, and data clean rooms, users can analyze data without compromising governance.
Interoperability. Integrate with other Apache Iceberg-compatible catalogs & engines, & with data catalog & data governance partners.
Access. Classify, share, discover and take immediate action on data, apps and more across regions & clouds in the Data Cloud.
Taken together, these processes ensure data governance throughout the data lifecycle, furnishing high-quality data to support AI projects. They also support AI-ready data, which reflects the full breadth and complexity of “the data supply chain” needed to deliver value from AI models. AI-ready data ensures that the supply chain can feed data of sufficient quality, accuracy, completeness, and freshness into environments that can train, deploy, and manage highly complex AI models throughout their lifecycle.
It also means that there’s a strong organizational safety net in place, with governance frameworks that can manage this supply chain within legal, organizational, and ethical boundaries to minimize risks associated with AI deployment, including data privacy issues, regulatory compliance, and ethical concerns.
The Alation DQP for Snowflake builds upon Alation’s Open Data Quality Framework (ODQF). It leverages Snowflake Horizon’s data quality and data metrics functions (DMFs), ingesting Snowflake's out-of-the-box and customer-created metrics to measure and monitor data quality.
The key benefits of this integration include:
Delivers proactive data quality management
Strengthens data governance and compliance
Empowers collaborative data discovery and access
With the DQP for Snowflake, users can create rules that help translate the values from data metric functions into signals that guide trusted usage. The above example shows a DMF that checks for email addresses that are not formatted correctly. Notice the value of 18 that was returned. Is this value “Good or Bad”? Applying a DQP rule against this value allows business users to see the status and quality of their data directly from Alation.
Alation’s ODQF was launched to drive data quality across the modern data stack. Today, one of the greatest sources of ‘data gravity’ in the stack (and one of the greatest drivers of innovation) is Snowflake, which sits at the heart of so many successful modern data stack deployments.
Having Snowflake join the initiative really emphasizes the interoperability and flexibility at the heart of the program. This will appeal to organizations that have embraced the idea of the modern data stack — multiple tools and platforms working together to build data solutions in the cloud.
It genuinely is a case of ‘better together,’ as Snowflake Horizon’s strong data quality and governance capabilities for data quality rules, masking, and privacy protection work seamlessly with Alation’s data lineage, impact analysis, and policy governance modules to deliver immediate value for data engineers, data stewards, and users themselves.
Indeed, this is an exciting innovation that will radically change how the bench of AI creators collaborate and innovate at scale
Data engineers get the tools they need to safeguard quality, troubleshoot breaks, and load trusted data for use.
Data governors and stewards can see and remediate DQ issues and monitor sensitive data.
Data users can access trusted data based on their role, benefit from recommendations, and interoperate across platforms to find what they need.
Security administrators can surface and fix vulnerabilities.
This is just the latest integration in our partnership with Snowflake. As their 3-time data governance partner of the year, Alation is constantly finding ways to support data governance with Snowflake that enables secure access and innovation at scale.
Let’s connect in person! Stop by and see us at booth #1330 at Snowflake Summit at the Moscone Center this week, June 3-6. Not attending? Sign up for a personalized demo today.