By Sergey Astretsov
Published on 2020年2月13日
Data and analytics leaders, like Chief Data Officers, are increasingly tasked with harnessing data to move the business forward. They need to create a data culture where more people are empowered to make data-driven decisions and create new opportunities. At the same time, data leaders need to ensure that the use of data is well-governed, accurate, adheres to policies and protects consumer privacy. Missteps can be costly — including huge fines and damage to the brand — and erode trust in data, ultimately derailing efforts to create a data culture.
Yet, traditional approaches to data governance are at odds with the need to deliver business value from data and analytics.
With the growing volumes of data, a wider range of data consumers and the increasing complexity of data environments, traditional, top-down data governance programs fail to scale.
According to Gartner, “By 2022, over 60% of traditional IT-led data catalog projects that do not use ML to assist in finding and inventorying data distributed across a hybrid/multicloud ecosystem will fail to be delivered on time, leading to derailed data management, analytics and data science projects.”*
Policies can’t be rolled out and enforced effectively and end-users don’t have the context to understand how and when policies apply to their analysis. Historically, governance programs have relied on restricting access to data which stifles innovation and impedes self-service efforts. Governance tools, like metadata repositories, are IT-centric and difficult for business users to get value from — so they abandon them and look to unreliable sources for data definitions.
On top of that, there often isn’t clarity around the desired outcomes of governance programs. Is governance a reaction to regulations, like CCPA and GDPR, a way to protect consumer privacy, or a way to gain a competitive edge? This lack of clarity makes it difficult to create organizational alignment and nearly impossible to create a data culture.
Traditional data governance lacks effective ways to influence how data is used. Data can be well maintained and still be used in ways that violate policies or lead to inaccurate analysis if used incorrectly. Having all the right policies is meaningless if they don’t connect to the output — the analytics themselves. As Michele Goetz, principal analyst at Forrester wrote, “What you really need to do is push data governance and policy execution into all the processes and automation that exist in your ecosystem.”**
Stewards are instrumental in doing just that. Stewards are individuals, whether SMEs, analysts or data management experts who have a deep understanding of the data. These stewards are critical to managing curation and maintaining data descriptions to ensure that data is used correctly and that policies are being adhered to. More and more enterprises are recognizing the importance of stewards and are ramping up their stewardship resources.
But, while enterprises are recognizing that stewardship is integral for connecting guidelines and policies to how data is being used — challenges remain. After working with our customers to examine those challenges, we have created an application that extends the Alation Data Catalog’s ability to make data easy to find, understand, trust, use and reuse. Now, the Alation Data Catalog is also a central point for policies, creating a single source of reference for data and how the data should be used.
We developed the Analytics Stewardship application in the Alation Data Catalog to bridge the gap between top-down policy-setting and policy interpretation and enforcement. The Analytics Stewardship application in the Alation Data Catalog along with a new professional services offering provides an easy way for enterprises to drive more value from self-service analytics while ensuring accurate, compliant data use. The result is a more collaborative and holistic approach to governance.
The Analytics Stewardship application addresses three of the biggest challenges enterprises face when it comes to unlocking the value of their governance programs.
Even with increased hiring and more focus placed on stewardship, the volume of data is growing so fast that it’s impossible for stewards to address all of it. The 2018 Forrester Wave: Machine Learning Data Catalogs found that, “…firms are still struggling under the weight of their data: 36% to 38% of global data and analytics decision makers reported that their structured, semistructured, and unstructured data each totaled 1,000 TB or more in 2017, up from only 10% to 14% in 2016.”***
How do stewards know which data to curate first? How do they know where their efforts will be most impactful?
The analytics stewardship application in the Alation Data Catalog introduces Curation Progress. Curation Progress is a dashboard that clearly indicates which data objects are used most and how well each data object has been curated. Alation’s unique Query Log Ingestion technology makes it possible to learn from active metadata, revealing which data assets are used most. With Curation Progress, stewards have all the relevant information they need to understand where their efforts will be most impactful.
We have heard time and time again from customers that policies are disconnected from analytics. Policies are communicated through means that are separate from the analytics workflow, whether that is email or internal wikis. Data users don’t know where to find relevant policies and can’t connect the dots between the policies and their analysis.
In this area, the Alation Data Catalog has already made inroads. Through TrustCheck and Compose, recommendations, deprecations, and warnings are surfaced directly within the analyst workflow, whether the end-user is investigating a data asset or writing a query. The Analytics Stewardship application deepens these capabilities with the Alation Policy Center. The Alation Policy Center is a single point of reference for all of an enterprise’s data and analytics policies. Now a steward can apply policies directly from within the Alation Policy Center and those policies are surfaced directly to the end-user, effectively connecting policy to data usage.
Given the sheer amount of data and policies needed to deploy and the lack of measurable impact that their work is having, stewards can end up feeling like Sisyphus, unsure whether their effort is having an impact. That lack of feedback can lead stewards to become discouraged and disengaged.
The Analytics Stewardship application provides crucial feedback for stewards, showing the impact of their work, laying the groundwork for gamifying curation, and adding the motivation needed to keep them doing critical work of connecting policy to data and analytics. The Analytics Stewardship application clearly shows the progress that stewards are making and ensures that their recommendations are being seen by end-users.
With the new Analytics Stewardship application, the Alation Data Catalog is now a single source of reference for data and how it should be used. Alation helps to scale the work of stewards and makes policies and guidelines transparent for end-users.
One of the failings of top-down governance is that at its best, a user is prevented from completing an analysis that might violate policy — with no explanation. They are simply prevented from accessing the data. They run into an opaque wall that doesn’t answer the “why?” If they run into that wall enough times, the end-user becomes discouraged — discouraged from trying to leverage data for decision-making and discouraged from being curious and rational.
With an agile stewardship program built on the Alation Data Catalog, the end-user knows why their analysis doesn’t fit with policy and can be recommended alternatives that will lead to success. Rather than hitting a wall, they are encouraged to learn and to use data better. As a result, the work of the steward helps the entire organization becomes more data literate and helps create a data culture where everyone is encouraged to use data to move the business forward.
* Gartner 2019 “7 Must-Have Foundations for Modern Data and Analytics Governance” author Saul Judah** Forrester 2019 “Data Governance Take a Turn — And It’s a Doozy” author Michele Goetz*** “The Forrester Wave™: Machine Learning Data Catalogs 2018” authors Michele Goetz, Gene Leganza, Elizabeth Hoberman, Kara Hartig