By Robert Seiner
Published on 2020年12月16日
This blog series has focused on staying non-invasive in your approach to data governance, the relationship between the work culture and governing your data, the selection of the best approach to implementing data governance for your organization, and activating your data governance program by making everybody a data steward. In this, the final blog of the series, I lay out how to get started quickly with data governance. I cover how tooling and demonstrating success and value with data governance by utilizing a data catalog does not require a long-drawn-out journey.
I was recently asked, “Which comes first, the data governance program or the data catalog?” This is very similar to the chicken and the egg question. Not to get too philosophical, but the answer depends on your view of evolution: the evolution of governing data within your organization. Certainly, by inventorying the data assets that an organization has, they can decide what needs to be governed and to what extent. The data catalog brings efficiency to governance process.
Which comes first, the data governance program or the data catalog?
A data catalog is the tool of choice for organizations that need to build confidence in the data that is most critical. The data catalog supplies ample information about the data to assist in data search & discovery, data stewardship, data analytics, and to deliver the backbone of a data governance program. As such, some people may say that the data catalog needs to be in place first in order to implement a data governance program.
If the purpose of your data governance program is to embrace data-driven decision-making, a data catalog will enable you to be successful. A data catalog can help with consistent data quality standards or strategically managing data as an asset to achieve accurate, trusted, and secure data that delivers business intelligence (two examples from recent clients). The words that resonate throughout these statements are consistent, standard, strategic, and trusted.
Consistent and trusted data comes from an improved understanding of the data. Improvements in understanding come from information that is made available about the data. A data catalog is the place to collect, maintain, and make metadata available to the people of your organization that must trust the data to maximize its value and activate your data governance program.
Beyond the reasons just shared, Gartner describes a data catalog as a tool that is used to “maintain an inventory of data assets through the discovery, description, and organization of datasets. The catalog provides context to enable data analysts, data scientists, data stewards, and other data consumers to find and understand a relevant dataset for the purpose of extracting business value.”[1]
Since the metadata in the data catalog needs to itself be governed, there are other organizations that believe that the practice of data governance must already be in place in order to fully implement a data catalog. I state often that “data and metadata will not govern themselves.”
The implementation of a data catalog requires that specific people within your organization are held formally accountable for the metadata. They must be responsible for defining the metadata that will be collected in the tool, producing the metadata that will be made available to people within your organization, and using the metadata that is available to assist them to complete their job functions.
Starting quickly with a data catalog requires that the metadata stewards be recognized and activated. Starting quickly with a data catalog to support a data governance program requires that the metadata entered into the tool is validated, kept up to date, and made available. Without meeting these two requirements, the likelihood of sustainable success with your data governance program, and potentially the data catalog, are immediately reduced and the risk of failure is increased.
[Note: While I firmly state that “Everybody is a Data Steward,” everybody is NOT a metadata steward. Metadata stewards are specific people with metadata responsibilities in the organization.]
While many organizations see the benefits that a data catalog bring to their analytical capabilities, some organizations will tell you that it is challenging to get people to learn and use a data catalog for the first time. But it is significantly more difficult to get people to return to the data catalog if they are disenchanted with their first experience with the tool. For example, if the metadata is not up to date or the information that is being provided in incomplete, users are unlikely to return a second time. Therefore, it makes sense to assure that the data catalog is fit for use when it is initially made available. Fit for use requires that the metadata in the tool is well defined, change control is in place to keep the metadata accurate, and people are educated as to how to get the most value out of the tools.
Specific facets of implementing a data catalog can be done quickly and effectively. The same can be said for data governance. However, to fully implement and sustain both requires commitment, patience. and the persistent application of resources.
Successful implementation of data governance and a data catalog requires that the leadership of the organization support, sponsor, and understand the value that comes from both, and the relationship between the two. Success also requires that the disciplines associated with both are implemented, following an approach that aligns with the work culture of your organization. In previous blogs in the series, the approach was referred to as Non-Invasive Data Governance.
Facets of a data governance program that can be implemented quickly and effectively include:
The recognition of roles and responsibilities that align with the culture of your organization.
The application of governance to data processes that improve the definition, production, and use of data.
The development and delivery of effective socialization and communications of governing best practices.
The activation of data stewards to improve the understanding, quality, and protection of critical data.
Facets of a data catalog that can be implemented quickly and effectively include:
The automation of ingesting metadata into the tool (in other words automate, don’t hesitate).
The utilization of machine learning to improve data management, governance, and consumption.
The delivery of an effective metadata hub combining a traditional glossary, stewardship, and a centralized marketplace for data intelligence.
The activation of metadata stewards to improve the definition, production, and usage of metadata.
Automate … Don’t hesitate
The speed of the implementation of a data governance program can be accelerated by the effective implementation of a data catalog tool. Keep in mind that effective data governance programs require that the organization select the approach that is appropriate for the organization, will not threaten the work culture of the organization, and will activate data governance by recognizing that everybody in the organization is a data steward.
When selecting the appropriate data catalog, it is important to get management to understand that “data catalogs have become the standard for metadata management in the age of big data and self-service analytics.”[2] It is also important to select the tool that most closely matches your organization’s requirements for data governance success from a software tool vendor that is an industry leader and has received high marks from analysts whose job it is to make the responsibility of selecting the right tool their duty and mission. Best wishes for success in your data governance and data catalog implementations.
Gartner Research – Data Catalogs Are the New Black in Data Management and Analytics – December 2017
What is a Data Catalog? – Dave Wells – An Alation Industry News Blog