By Jason Lim
Published on 2022年3月1日
There are many approaches to manage and use data. Despite a plethora of frameworks, ideologies and best practices in the data world, data is fluid and must adapt to unique scenarios. Every organization is different, and so are sub-teams within organizations. That’s why a certain level of control and customization is important to fill the gaps that high-level structure cannot.
Bill Hostmann, VP and Research Fellow at Dresner Advisory Services agrees “Data catalogs have emerged as a core set of capabilities for making content easier to find for analytic use cases, especially when there are multiple data sources being accessed for various analytic use cases. As data complexity increases and organizations become more ‘data driven’, catalogs will continue to rank as a high priority.”
In the latest release of Alation, 2022.1 generally available on February 25th, there are a set of key features that empower organizations to build intelligence around data, in their own customized way.
Data lineage is a critical way to graphically see the relationship between data, from source to target. It helps to answer simple but important questions like, Where did this data come from? and, How did it transform over time?. Through Impact Analysis, users can determine if a problem occurred with data upstream, and locate the impacted data downstream. With robust data lineage, data engineers can find and fix issues fast and prevent them from recurring. Similarly, analysts gain a clear view of how data is created.
Column level lineage captures transformations in detail. It’s useful for ensuring compliance downstream and fixing broken reports when a column is to blame. Until now, column level lineage in Alation has been automatically created. Yet edge cases persist where lineage cannot be automatically captured.
We close this gap with manual lineage updates in 2022.1. Users can now manually add and enrich the lineage graph through the user interface directly. Now, manual lineage objects will inherit rich automated lineage functionality. This includes deprecation, propagation, and impact analysis. This ability to add more nuanced data objects to the lineage graph represents a major benefit for all users, as these features enhance the level of control and insight for the Alation user community.
Airline Reporting Corporation (ARC) sells data products to travel agencies and airlines. Lineage helps them identify the source of bad data to fix the problem fast.
Manual lineage will give ARC a fuller picture of how data was created between AWS S3 data lake, Snowflake cloud data warehouse and Tableau (and how it can be fixed). It will also spare ARC the time-suck of parsing Python transformations in pursuit of that picture.
“Time is money,” said Leonard Kwok, Senior Data Analyst, ARC. “The quicker we can fix data, the sooner we can deliver to customers on time. Manual lineage gives us a quick and easy way to document data relationships and trace where it came from. We can use manual lineage to perform root cause analysis. This means we can identify problems from our data suppliers in minutes instead of hours or days. That translates to faster product development and remediation for the business.”
The ability to drill down to the column level for lineage is immensely helpful, especially when enterprises have to meet compliance regulations, such as BCBS-239 for banks. Column level lineage provides a means to discover different sources of data (and the data flow) used to create risk reports. For example, a bank might use this feature to aggregate risk exposure at that bank’s group level.
In 2022.1, Alation is making automated column level lineage add-ons available for popular cloud data warehouses, AWS Redshift and Google Big Query. This expands the data source compatibility, which was limited to Snowflake in 21.4.
Technical users may wonder: How are such advancements possible? By parsing query logs and picking up the column-level attributes, the Alation Data Catalog can populate the lineage graph. And by automating column-level lineage, lineage maintenance is quicker and easier.
ARC also plans to leverage automated column-level lineage for Snowflake. This will reduce the need to seek out tribal knowledge to understand how data changed. When people switch teams or leave the organization, tribal knowledge about how data was constructed leaves with them.
“Now, all the transformations will be captured, and put in one place,” said Kwok. “I won’t need to contact a developer to explain how they transformed data. We’re confident automated column-level lineage for Snowflake will be super helpful.”
This new feature will help ARC improve and speed up its internal data operations. In the future, Leonard expects that automated column-level lineage in Snowflake will also minimize risks around data privacy. With this feature, users have visibility into which columns are related to PCI or PII. This ensures that sensitive data is not exposed to unauthorized parties.
Another feature we’re excited to debut with this release is homepage customization. Ultimately, Alation Data Catalog is a vehicle for driving data intelligence for our customers — no small feat! That’s why branding and customizing Alation to fit your brand identity and mission is a simple yet effective way to make the community feel a part of a tribe.
When building a movement, branding goes a long way. Some of the novel names we have heard from our creative customers are ‘Captain Self Service’, a hero to promote contribution to the data catalog and R.I.D.E – Regeneron Information and Data Explorer, the name of the Regeron data catalog.
In this release, we are continuing to enable our customers to customize the look and feel of Alation. In particular, attractive background images in the homepage header can be set, similar to a Linkedin profile header image.
Additionally, alert banners can be set at the top of the homepage with preset colors for general information, critical warnings, and success alerts. This is useful for communicating a major update. The capability to control this level of branding and communication contributes to the overall adoption of Alation.
Alation has the broadest and deepest connectivity of any data catalog. And connectivity is the crux of a powerful data catalog. While Alation has a large library of native data source connectors to databases, cloud data warehouses, data lakes, BI tools, event streaming, and more; inevitably there are other data sources that don’t have an existing connector.
To expand the breadth of our connectivity, Alation’s Open Connector Framework (OCF) SDK for relational databases and BI tools is becoming generally available. This SDK enables developers to build a connector to a data source, systematically with easy to follow step-by-step instructions. The freedom to index any data source in Alation dramatically opens up your enterprise to more data intelligence solutions.
No doubt, connecting to a wider variety of data sources is useful. But what if you could get more value out of a single source? Deep connectivity is the answer. It refers to the type of metadata we can extract, lineage that can be created, queries that can be run, profiling that can be calculated, and sampling that can be previewed.
APIs support connectivity to power data intelligence. They enhance the interoperability between Alation and data sources to create unique custom solutions and workflows. APIs are available for domains, search, authentication, lineage and relational database integrations, flags & tags, and more. In the spirit of making our APIs easier to use, we are excited to announce the new Alation Developer Portal. The portal provides beautiful and interactive API documentation to implement APIs. Unlike the previous new features mentioned, this is separate from the 2022.1 release.
Advantages of the new Developer Portal include:
API versioning by release
Interactive commands
Multiple language support
Browser based API exploration
Coupled together, Alation’s OCF SDK and improved API Developer Portal, improves the flexibility and extensibility of Alation Data Catalog.
Data classification via tags is a simple yet powerful capability. It enables users to search & filter, apply data policies, certify data, and so much more. Yet manual classification can gobble time. And the ability to automate classification from a single system (like a centralized data catalog) can save data teams many hours. In this way, automated classification represents a major efficiency gain.
We continue to innovate on active data governance. In 2022.1, we released the Snowflake Tags feature, which reflects our deep partnership with Snowflake. This adds to our existing Snowflake integrations, like Policy Center.
Snowflake Tags can be ingested and applied to Snowflake data indexed by Alation and inserted into custom fields. These tags can then be updated in Alation and synchronized back to Snowflake. This enhancement will streamline governance activities by automating the classification of Snowflake data.