By Talo Thomson
Published on 2022年5月24日
In a sea of questionable data, how do you know what to trust? Data quality tells you the answer. It signals what data is trustworthy, reliable, and safe to use. It empowers engineers to oversee data pipelines that deliver trusted data to the wider organization. Downstream, it gives analysts confidence to build and share reports that drive the business. Yet data quality information is often siloed from those who need it most. Business users must exit their workflows to verify data integrity – wasting time and resources and diminishing trust.
Today, as part of its 2022.2 release, Alation is launching its Open Data Quality Initiative to address the challenges of data quality by integrating this valuable information directly into workflows. The Open Data Quality Framework, or ODQF, describes the complete set of product capabilities that are included as part of this initiative.
I sat down with Peter Wang, Alation’s senior product manager and data quality lead, to learn more about the initiative, product framework, and how data users at all levels will benefit.
Talo Thomson, head of content marketing, Alation: Hi Peter, thanks for taking the time to speak with me today. The Open Data Quality Initiative is a big focus of our latest release. Before diving into that initiative, can you tell us what you’re hearing from customers on the topic of data quality? What does data quality mean to them and why is it so important today?
Peter Wang, senior product manager, Alation: Data quality has consistently been a hot topic for our customers. It affects everyone who produces and consumes data. Data producers need data quality information to ensure that the data assets and pipelines they are maintaining are working as intended. Data consumers need that information to trust that the data is good to use.
Data quality is one of the primary signals behind whether or not a data asset or analytical report can be trusted. So understanding data quality is extremely important for an organization to drive the correct decisions from analytics.
Talo: What does the data quality vendor landscape look like, and how will this initiative fit into that landscape?
Peter: Overall, we see a large variety of data quality solutions. There are both in-house solutions by our customers, as well as vendor-built solutions.
You may be wondering why so many businesses choose to custom-build their own data quality solutions. One of the unique challenges of data quality is that different organizations and business verticals have their own unique requirements around what constitutes “good data quality.” Some of these challenges are extremely complex and industry-specific. This diversity of opportunities means that there will be a large number of data quality players in the space, each with their unique strengths and weaknesses.
This is why Alation is launching the Open Data Quality Initiative. Data users need the freedom to tackle the unique challenges of data quality with total flexibility, and this initiative empowers them to do so. We’re giving people what they need to customize their data quality solution into the catalog. This means they can integrate a data quality solution from a best-of-breed vendor, or even their homegrown solution, into the data catalog.
Talo: Who benefits from this initiative?
Peter: One common challenge that we see across our customer base is that currently much of this data quality information is siloed within IT, data engineering, or dataOps. While it’s critical for these personas to address data quality alerts in order to maintain SLAs, downstream users who are ultimately driving the analytics in the organizations are often blind to these issues. This can lead to reports being derived from bad data.
This is where we believe Alation’s approach to openness and transparency, through our Open Data Quality Initiative is unique. With Alation’s ability to capture behavioral metadata, we can combine information on data usage, data lineage, and data quality alerts to accurately notify users across the entire data pipeline when there is a data quality issue that impacts their work.
Talo: How should organizations think about the relationship between the data catalog and data quality?
Peter: The ability of organizations to effectively find, understand, trust, and use their data is the core function of a modern data catalog. Data quality guides the user along each and every step of the way. It is critical that the data users see data quality in the data catalog as part of their workflow.
For example, when users search for data, they should immediately be able to discern whether that data they find is trustworthy and of high quality. When they look at a data asset in the catalog, they should be able to understand if there are any upstream issues that might impact that asset’s data quality. When a user looks at a report, they should be able to immediately know that the entire pipeline has gone through data quality checks at each step of the process. How else can they trust the result?
Alation has built a lot of functionality to surface that data quality information to data users. We have profiling capabilities on the underlying data, we have trust flags where other experts and data stewards can endorse, warn, and deprecate data. We even built TrustCheck into Compose, our SQL Integrated Development Environment (IDE), to surface that critical information within the workflows of analysts in a spell-check-like functionality as they are building out their queries. So query writers get data-quality feedback on the query they’re crafting before they even run it.
Talo: What’s the role of rule automation in data quality?
Peter: DQ tools often go a step further in automating the work of checking for data quality. Typically, such tools will allow users to create specific rules to which data should conform. That rule is then automatically checked against that particular data to ensure that the data does not fall outside the boundaries of that specific rule. Each data asset might have a multitude of rules that the DQ tools are constantly checking in order to ensure data quality.
If the data quality tool detects any anomaly, then an automatic workflow should be triggered, which alerts both the team responsible for that underlying data, as well as all catalog users, by sending the corresponding alert into the Alation catalog. Alation can then take that information and send it to the downstream data users who have dependencies on that data asset and could have been impacted by the detected anomaly.
Talo: How does Alation know who to alert?
Peter: Alation has unique insights into the behavioral metadata of an organization. This metadata shows how data is being used, and by whom.
These human insights allow Alation to break through the data quality knowledge barrier, and alert just those affected by DQ changes in real-time. We don’t want to overwhelm people with alerts on data issues that don’t affect them. But we also want to ensure that those affected by DQ changes are aware as they arise.
So what is the Open Data Quality Initiative?
The Open DQ Initiative is composed of several parallel motions.
First, let’s discuss the product. We are rolling out an Open Data Quality Framework (ODQF) with our partners, which will enable different DQ tools to push data quality information into Alation. This framework includes APIs, as well as a set of best practices, to help normalize the large variety of data quality metrics and rules into a standardized format that can interoperate between different systems. We want to make sure that our customers can choose the best DQ vendor for the job that best fits their needs.
From a catalog feature point of view, the Open DQ Initiative will also include several new feature rollouts. These features include prominently displaying data quality flags in various areas of our catalog and automated alerting of data quality when there are critical schema changes (like upstream deletions). We also plan to integrate data quality into other Alation initiatives, such as Alation Anywhere, to surface data quality alerts in other places, like email or chat apps.
Talo: Which DQ partners support this initiative today?
Peter: We have 7 DQ partners already in our ecosystem, ready to deploy as joint solutions. These include Acceldata, Anomalo, Bigeye, Experian, FirstEigen, Lightup, and Soda.
Talo: How can new partners participate or integrate if they’re not already? How will DQ partners benefit?
Peter: We welcome any additional partners who want to join our open data quality initiative. If you are a DQ tool but you aren’t a partner yet, and would like to join, we would love to have you onboard! Data quality vendors interested in Alation should apply to become Alation partners. We will provide an integration kit and walk you through how we might best integrate.
Talo: This isn’t the first framework Alation has developed. Openness seems to be a key piece. Can you say more about that?
Peter: Correct. We also have an Open Connector Framework that allows customers and partners to create their own connectors and applications, which leverage all types of metadata Alation captures, including technical, behavioral, and lineage. To support developers in creating their own connectors, we provide a starter kit and well-defined APIs.
These open frameworks are important to Alation, our customers, and the market for a few reasons. First, we want to empower our customers with choice. As data & analytics for specific industries grows more specialized, the ability to custom-build your own tools (to address more unique use cases) grows more essential. Second, we continue to evolve the data catalog into a data intelligence platform. We’re confident that making Alation a place to collaborate, innovate, and custom-build unique tools will support that goal. We want people to innovate on Alation!
Talo: You mentioned this is being done in phases. What’s next?
Down the road, customers can anticipate more alerting features and lineage automation, as we more deeply integrate data quality functionality into the data catalog. Data should be easy to trust, understand, and use at a much faster pace than what’s typical today. We look forward to innovating in this area!
Curious to learn more about the Open Data Quality Initiative?
Watch the data dialog: Why an Effective Data Quality Program Includes a Data Catalog
Read the blog, Alation 2022.2: Open Data Quality Initiative and Enhanced Data Governance
Read the press release
Get the solution brief
Listen to the podcast interview with Kyle Kirwan, CEO and co-founder of Bigeye
<!–>