By Nolan Necoechea
Published on December 21, 2020
More and more organizations are turning to data catalogs to achieve their business goals from empowering employees to make better decisions to fueling digital transformation.
According to 451 Research, “An overwhelming majority of organizations see data catalogs as a potential solution” with 85% of organizations leveraging some form of data catalog. Gartner calls data catalogs a “must-have” for data and analytics leaders and reports that, “Demand for data catalogs is soaring as organizations continue to struggle with finding, inventorying and analyzing vastly distributed and diverse data assets.”
All data catalogs, however, are not created equal. With growing demand, numerous offerings have emerged. There are the data catalogs that work for a specific tool or infrastructure; there are data catalogs built as part of command-and-control governance suites, and there are data catalogs built as a platform that create a centralized place to find and understand data across data sources and applications.
A data catalog can help enterprises build data culture, empowering everyone from data analysts and data scientists to business users and data engineers to leverage data more effectively and collaborate with one another.
One of the ways data catalogs promote a strong data culture is through data democratization. Data democratization makes it possible for non-technical users to access the data they need without relying on IT. Data democratization makes it easier for more data consumers to make data-driven decisions.
There are a lot of myths and misconceptions about data catalogs and what features a data catalog should have. Nearly every data catalog will have a set of necessary features, like a business glossary, wiki-like articles, and metadata management.
But, to be really successful, data catalog software must also address five key aspects: intelligence, collaboration, guided navigation, active data governance, and broad, deep connectivity. The breakdown below will help you make an informed decision when evaluating data catalog software by explaining why these five aspects are critical.
Intelligence is at the core of a data catalog’s ability to make data searches relevant and curation scalable. Intelligence automatically surfaces clues in the data to remove the manual effort otherwise required for discovery within the huge volume, variety, and veracity of data facing the modern enterprise.
Intelligent systems powered by machine learning are necessary for overcoming the challenges that come with modern data management, and that’s one area where data catalogs need to shine. According to a 2020 451 Research report, “data catalogs are rapidly building out automated functionality” including “automated suggestions, automated discovery and tagging, and automated data-quality scoring.”
Data and analytics is a team sport and as more people move to remote work environments, collaboration becomes even more vital. Data catalogs should spur collaboration not only across geographies but across expertise.
With effective collaboration, each contributor works toward a common goal, building off of the work of others, and opening the door for greater innovation. Without collaboration, data consumers (and their knowledge) are siloed and work is needlessly recreated. To create a data culture, collaboration needs to be a seamless part of data and analytics.
Guided navigation is important for helping data consumers find the right data and use it properly. The huge number of locations where important data can reside — from data lakes, databases, and reports to APIs and queries — makes finding data difficult even for veteran data analysts. And, finding data is only half the battle — understanding how to use that data is the other half.
A data catalog needs guided navigation to point data consumers to the right data, the right context, and ensuring that the data is being used properly. Rather than giving data consumers an atlas, guided navigation provides them with turn-by-turn directions.
According to Robert Seiner, author of Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success, data governance is, “the formalization of behavior around the definition, production, and usage of data to manage risk and improve quality and usability of selected data.”
Data catalog software should empower an active approach that encourages those who are working with data to be active in its governance — closing the gap between guidelines, policies, and the way that data is actually being used.
To be able to effectively guide analytics work, spur collaboration, and encourage active data governance, data catalog software must connect to many different data sources. 451 Research’s report argues that this may be “the most significant attribute of a catalog.” The greater the ability of the catalog to connect to multiple data sources in a complex data environment, the greater the ability to break down data silos.
With the data catalog software as the platform, users can expect one centralized location that connects to all of the data they need.. And, it isn’t enough for the catalog software to access data sources, data catalog software must also collect the metadata that fuels intelligence.