By Nolan Necoechea
Published on 2020年12月8日
Before we dive into what’s new with Alation 2020.4, how does Alation help you as a data scientist?
I work with a lot of teams at Alation. As part of that, I spend a lot of time writing queries and then taking the output of those queries into Tableau and R for visualization or R and Python for statistical analysis and regression, such as identifying companies that could benefit from Alation. At a high level, Alation makes it much easier for me to find the data I need and make better decisions about how to use that data. First, Alation makes it easier to capture knowledge about data that would otherwise be siloed across the organization. Second, Alation uses machine learning to provide greater context on the data without the need for manual effort.
Can you give an example of how Alation helps capture knowledge on data?
Sure, happy to. Let’s say that you find a table that seems to be right for your analysis, but in Alation you see a warning from your colleague that points you to a more up-to-date table. With that warning, you’ve just saved yourself from having to discover the staleness of that table yourself.
Alation makes it easy for anyone to add helpful descriptions of tables and columns, which in turn, centralizes the organization’s data knowledge. I find it especially amusing when I stumble across a particularly helpful table description, only to find that it’s a description I wrote two years ago. What’s great about this is I don’t have to remember the particulars of every data set in order to use them or spend a lot of time tracking someone who does. The knowledge is right there in Alation.
How does machine learning come into play?
Alation uses machine learning to provide greater context on the data without the need for manual effort. Alation’s Behavior Analysis Engine is constantly running behind the scenes to learn how data is being used. And then, as you are pulling data with Alation’s intelligent SQL editor, Alation proactively surfaces those learnings. For instance, as you’re writing a query, Alation will auto-suggest tables and columns based on what others have queried, much like Google auto-completes your searches. All of this saves a lot of time. I can focus on the analysis rather than having to memorize the nuances of every table and column I touch in the course of my analyses.
Now, let’s jump into Alation 2020.4. What’s new?
In 2020.4, Alation adds to a lot of the things that already made the data catalog great for a data scientist or data analyst have become better. The data profiling, analytics, and lineage capabilities that already existed in Alation are all more robust.
Let’s start with data profiling. What’s new there?
Alation’s data profiling capabilities help reduce the time spent in the data exploration phase of an analysis, which depending on how familiar you are with the data and how much data you need to search through, can be really significant. Alation can tell you what is in a data set before you query it to give you a better idea of whether it’s what you are looking for. With 2020.4, you can get an even deeper level of statistics that are generated per column as part of profiling for custom databases, like the number of empty values or non-numeric values. The data profiling includes new charts and customizations that make it even easier to at a glance answer the question, “Is this what I’m looking for?”
What does 2020.4 add when it comes to analytics?
Alation added new popularity charts for tables and queries, which helps me find what I need and provides great feedback on whether my work is being used by others. I’ve written quite a few queries that others in our organization run on their own to self-serve their data questions. With this popular query information available from Enhanced Analytics, I can see which of these queries have caught on, which might need some more socialization, and which might need to be updated if they were old but still commonly used.
And, data lineage has seen a big update as well?
Yes, Alation’s data lineage gives you important information about where your data came from and what is being created with it. The UI for data lineage has been improved, and new APIs have been added to make it easier to track the impact of changes across multiple tables.
As some added context, could you compare your experience working with data before you joined Alation?
I’ve been at Alation for about two and a half years now. Before Alation, I worked at both small and large companies. All of them struggled to understand their data.
I remember working at a large company that was putting together a hackathon. Our hackathon project required a specific set of business data. Even though we knew the data existed, we couldn’t find it even after weeks of searching with emails going up and down the corporate ladder. In the end, we ended up just changing our hackathon project. At the other end of the spectrum, I worked at a very small company with an analyst team that was small enough that we used a freeform wiki to capture our knowledge on data. The issue with a wiki is that you only find that knowledge when you actively seek it out. In Alation, that important information is automatically recommended to you as you are writing a query with smart-suggest.
Do you have any closing thoughts?
Working at Alation and leveraging the data catalog has shown me that there is a better way to work with data. Now, I can’t imagine not being able to find an important data set or being able to seamlessly collaborate with colleagues across the organization. At the end of the day, Alation makes working with data much easier and more fun. You get to spend less time navigating your data landscape and more time digging deep and getting to insights!