By Talo Szem
Published on 2021年4月2日
Three new case studies from Alation have exciting implications for the world of data management. The Time Is on My Side case studies examine major pain points for data-centric enterprises.
The Alation platform saves many hours of time and effort, according to study findings. An accompanying infographic for CDOs boasts a basketball theme to celebrate March Madness and offers key takeaways. Like a basketball coach, a smart CDO will boost teamwork by creating a culture of collaboration.
Recently I sat down with the study authors and data scientists at Alation, Andrea Levy and Naveen Kalyanasamy. We discussed their methodology, findings, and key takeaways.
You two are data scientists. Why will other data people be interested in these case studies?
First of all: impact! The query reuse case study, especially demonstrates the value of collaboration and centralization of analytics teams. For me, it reinforced the value of sharing queries in Alation, which we do internally, quite a bit.
Secondly, measuring value is really hard. We put in a lot of effort trying to quantify value, and that was an enlightening exercise. I’d encourage other data nerds to come along with us on that journey as we narrowed in on how to quantify the value of Alation.
How does Alation broadly add value for the data scientist?
I think the biggest roadblock is finding the right data and understanding it enough to use it for our analytics needs, particularly if the data spans different sources. I actually did some background research to see if data supports this opinion and it does — an integral part of the analytics process is data preparation: discovering the right data, understanding the data, and wrangling the data into the desired format.
For these case studies, you two analyzed Alation usage patterns for real customers. What was your approach?
Again, measuring value is hard. You can measure increased efficiency or value when you can calculate how long something took today versus how long it would have taken under other conditions: the counterfactual. Calculating the counterfactual isn’t easy. We spent a fair bit of time refining the problem and evaluating the data to calculate it.
Once we had the concept for these case studies, we brainstormed various aspects of Alation and tried to think through how activities would be performed without Alation. Then we had to map those counterfactual scenarios to measurements that we could still take within the Alation product.
If I were to simplify it, for each of these case studies we measured how customers utilized 3 different feature areas of Alation and compared that to the effort they would need to put in if they did not have Alation; we then estimate the resulting time and effort that they’ve saved through that feature.
For example, in the “Onboarding” case study, we look at how Alation’s SQL editor, Compose, fast-tracks the onboarding process because there is no longer a need to install different query tools for different data sources.
Can you talk a little more bit about Compose? What is its value?
Compose is essentially one tool to query data from multiple data sources, brought together in the catalog. People querying data can just work in this one place.
Compose has intelligent built-in features that feed from the Alation catalog to provide context, suggestions, warnings, and previews around the tables and columns that they’re working with. This helps them find the data, understand it, and write queries more quickly.
What study findings surprised you?
The scale of these savings was impressive. We had to double- and triple-check our calculations! I know that we at Alation leverage shared queries, and it saves us about 100 days in a year, but saving 325 days over the course of a year in an organization is crazy! And if you take a step back from the (impressive) numbers, you really get a feel for the impact of the work of the broader data teams at those companies.
As people inside Alation it’s easy for us to see the potential value of a feature or an area of Alation. But it was pleasantly surprising to see how much some of these organizations used Alation to maximize value.
Take data stewards as an example. They spend countless hours manually curating data. A data catalog uses machine learning and AI to do much of that work. We actually counted the hours that Alation spared data stewards, and found they saved 211 manual work days at one company. That’s a huge value.
What do you hope people take away from these case studies?
I hope folks can appreciate the creative approach we took to actually put rigor behind value calculations. Too often we see sweeping categorizations and envelope math (for example, bucketing out users by type, and simply multiplying that number, as opposed to actual measurement). We got to measure value rigorously. I hope these case studies inspire others attempting to measure value.
We have devoted a lot of thought and energy towards making this study as rigorous and transparent as possible. I think the output reflects that. In each of these case studies we go into a lot of detail explaining the methodology behind how we measured things before we get to what the actual results say. I hope this would serve as a sneak peek into the impact just these 3 areas of Alation have had. And there are broader benefits from the catalog that we haven’t talked about.
Do these results resonate with you? In other words, has Alation made you more productive… at Alation?
I don’t think of it as productivity, I think of it as spending more time being better guided.
Let’s take the query reuse example. Suppose I have to summarize some customer numbers for the Marketing team. I could write this query from scratch… or I could just run the query Naveen wrote last month and not have to write it from scratch. (How did I know that Naveen had written that query? I searched for it in our centralized data catalog!) Even if the query wasn’t quite what I was looking for, I could just spend a bit of time understanding his approach, and then make a copy of his query and alter it for my (new) purposes.
Even if I’m doing something totally new with our data, I’m still able to leverage metadata about our tables and columns, which I can use to make better decisions about what data to use. And that metadata could have been masterfully populated by an expert, leveraging our bulk curation tool. So it might not have taken them too much time to fill in.
And you, Naveen? Has Alation helped you be more productive?
Definitely! When I first started, the Alation instance inside Alation helped me onboard very quickly, and it turns out my own experience matched our study findings, as Alation customers reported up to 6 month’s savings in onboarding time for new hires. I had zero tools to install to work with data across multiple sources.
And to this day, the catalog teaches me new things about the data because there is a wealth of knowledge: context and metadata that has been curated around it. I’m also able to find and learn from my peers’ work and collaborate effectively across departments every day.