What in the World is an Augmented Data Catalog?

By Nolan Necoechea

Published on 2020年2月11日

It seems like every time we think we have grasped a new technology and the need for it, something shifts in the landscape. Sometimes that shift is an incremental improvement in the technology itself that ostensibly enhances the original version. Other times that something is a radical shift that changes the nature of the technology itself. And yet on many occasions, as the impact of the technology is better understood, its name is changed to better reflect its core value.

Does this sound familiar? We’re going through this exact scenario right now with data catalogs. Even pre-Big Data, enterprises recognized the need to make data easier—and faster—to centrally define, categorize, and describe. At the time, however, the technology required to bring automation to data cataloging just didn’t exist, so data catalogs and their cousins, metadata repositories, required a lot of manual labor to maintain. And even then, the data in the catalogs and repositories was static, reflecting a single point in time—in the past.

Utilizing Machine Learning

In contrast, today’s modern data catalogs, such as the Alation Data Catalog, use machine learning to learn from users and borrow the power of crowdsourcing from consumer-facing data catalogs, including Amazon and Yelp, to make data easier to find, understand, and trust.

Extracting real business value requires data scientists, business analysts, and other data consumers to have access to fresh, current, and accurate data. Metadata must be active, not passive. This is exactly where the machine-learning comes in, helping to automate the most tedious tasks involved in cataloging data to ensure the latest and most robust data is available to those who need it. Machine learning automates the discovery, ingestion, and enrichment of metadata, as well as identifies relationships between metadata. The result? Data consumers can find, understand, and use relevant datasets more quickly—reducing time to insight and improving business decisions.

So now onto the word “augmented.” This is Gartner’s way of saying the data catalog uses machine learning to automate the arduous, manual tasks. And while the term may sound new, it describes what Alation has been doing since its inception.

In order to illustrate the value that automation brings and separate modern data catalogs, such as Alation’s, from the static repositories of the past, Gartner has coined the term, “augmented data catalog.” In their recent report, Augmented Data Catalogs: Now an Enterprise Must-Have for Data and Analytics Leaders,” Gartner’s Ehtisham Zaidi and Guido de Simoni both introduce the term and explain why augmented data catalogs are no longer an optional or “nice-to-have” technology for enterprises that want to gain or retain a competitive advantage.

Let’s take a step back for a moment to better understand Gartner’s thinking. As we move closer to a data-driven world, finding and inventorying data assets that reside in myriad places within—and outside of—any particular organization grows more critical by the day. It’s the necessary first step in effective analytics. It’s also one of the biggest challenges for data management teams today. And it’s one of the fundamental reasons why the demand for data catalogs continues to grow.

According to Zaidi’s and de Simoni’s report, “Demand for data catalogs is soaring as organizations continue to struggle with finding, inventorying and analyzing vastly distributed and diverse data assets. Data and analytics leaders must investigate and adopt ML-augmented data catalogs as part of their overall data management solutions strategy.” What’s more, Gartner posits that, “organizations that offer a curated catalog of internal and external data…will realize twice the business value from their data and analytics investments than those that do not,” by the end of 2021. That’s certainly a compelling argument for ML-augmentation.

In Conclusion

Interested in learning more about why Gartner is calling the augmented data catalog a “must-have”? Check out our on-demand webinar with Gartner Senior Director of Data Management, Ehtisham Zaidi.

    Contents
  • Utilizing Machine Learning
  • In Conclusion
Tagged with