By Dave Wells
Published on 2020年4月14日
As data cataloging has matured and gone mainstream the diversity of data catalogs has expanded, with data catalogs embedded in many data preparation and data analysis tools. The embedded data catalogs offer some advantages of technology integration. They create a new problem, however, that is sure to be challenging. Proliferation of data catalogs leads to pockets of metadata with inherent redundancy, inconsistency, and uncertainty. Embedded data catalogs are a certainty for the future of data management—a certainty that underscores the need for an Enterprise Data Catalog (EDC).
An enterprise data catalog (EDC) is a single source for all of the information that is needed to work effectively with data. Unlike embedded data catalogs that see only a limited scope of data and typically support a single cataloging use case— data governance, data preparation, self-service analytics, etc.—the EDC knows about all of your data from original sources to analytics-ready data and supports all use cases across the entire data lifecycle. Furthermore, the enterprise catalog uses artificial intelligence and machine learning to continuously expand data knowledge and gather knowledge about changes in the ever-evolving data resource.
Making the business case for an EDC is an important but challenging undertaking. It is important when facing the questions: Why do we need another data catalog? Don’t we have several already? And that is precisely the reason that you need an enterprise data catalog; because multiple bolt-on data catalogs are the path to metadata disparity. It is challenging because expressing the value of the EDC must work with both tangible and intangible benefits. I see those benefits in four dimensions (see figure 1) that are interdependent such that each directly supports, enhances, and derives value from the others.
Business impact is at the center of the business case, but business impact is difficult to achieve without attention to data management, data analysis, and organization and culture.
Data management, which is fundamental to getting value from data, has become increasingly complex as we have experienced radical changes in data volumes, data types, data velocity, and data use cases. Effective data management begins with data knowledge—knowing what data you have, where it comes from, how it is organized, what it means, level of quality, and much more. The EDC collects data knowledge in one place and maximizes the knowledge of data that you have using AI/ML to automate metadata discovery and crowdsourcing to collect valuable tribal knowledge. Comprehensive EDC metadata provides the means for enterprise wide sharing of both data and knowledge about the data. Data and knowledge sharing leads directly to increased reuse (and corresponding increased value) of data. Frequent use and multiple use cases become a strong catalyst for data quality improvement.
It is often said that data scientists and data analyst spend only 20% of their time doing data analysis work, with 80% consumed by data issues. The bulk of their time is spend finding, evaluating, understanding, and preparing data before analysis can begin. The EDC changes the ratio of data issues time vs. data analysis time, potentially reversing the numbers to 20% data time and 80% analysis time. Extensive metadata and shared knowledge make it easy to find data through searching the catalog, to evaluate and understand through metadata and shared experiences of data workers, and to prepare data with tips and techniques as part of the shared knowledge. Using the catalog improves and accelerates analysis projects saving time, saving money, enhancing analysis quality, and expanding the organization’s capacity to perform data analysis.
In addition to many other benefits, the EDC delivers human and cultural value. It fosters communication and collaboration as it becomes the core technology that enables data and knowledge sharing. Typical tensions such as business urgency conflicting with IT overload, or self-service conflicting with governance constraints are remediated with catalog time savings and knowledge sharing. Easing of tensions reduces friction and improves working relationships. That, in turn, increases collaboration and sharing which further reduces friction—a virtuous cycle of organizational growth and expanding data competencies. The EDC is at the heart of cultural shifts that are central to becoming a data-driven organization.
Making positive differences in business outcomes is the ultimate basis for the EDC business case. Using the right data, in the right ways, to support business decision making and help to achieve business goals is the imperative. Business impact begins with trust in data and analysis. The catalog helps analysts to find the right data for each analytics use case, to understand that data, and to perform analysis with full knowledge of the data and how best to use it. Trusted data and trusted analysis are the foundation for quick and confident decisions that are the keys to business agility. Data-driven decision making that is timely and confident is a core competency that is needed for digital transformation and business innovation.
Enterprise is the key word in enterprise data cataloging. Fragmentation and silos seem to be inherent in today’s data and analytics world. Data sprawl across multiple platforms and databases inhibits the enterprise view. Multiple tools and technology sprawl across tools for data ingestion, integration, blending, preparation, and analysis are additional barriers to enterprise perspective. Multiple, tool-specific data catalogs further aggravate the problem with metadata disparity. Moving to a future of data operations and dynamic data-driven business innovation relies heavily on the enterprise view. You’ll need an enterprise data catalog to get there.
An enterprise data catalog (EDC) is a single source for all of the information that is needed to work effectively with data.
Making the business case for an EDC is an important but challenging undertaking. It is important when facing the questions: Why do we need another data catalog? Don’t we have several already? And that is precisely the reason that you need an enterprise data catalog; because multiple bolt-on data catalogs are the path to metadata disparity.