DQP Overview

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

At Alation, we understand that no single data quality tool fits every organization. That’s why we embrace an open approach, allowing organizations to integrate their preferred data quality (DQ) tools effortlessly into Alation.

Our Open Data Quality Initiative and Data Health API have allowed seamless integration of custom data quality tools or best-of-breed solutions from our technology partners. This means Alation integrates DQ results from your chosen tool, presenting them through intuitive health scorecards, upstream alerts, and DQ overlays in lineage diagrams, making insights trustworthy, actionable, and accessible for all.

With the Data Quality Processor (DQP), Alation takes DQ one step further. This capability integrates natively with leading Data Cloud Platforms’ emerging DQ capabilities to directly bring their data quality results into Alation’s Data Intelligence Platform.

To get the DQP contact the Forward Deployed Engineering team, or ask your Alation account team.

Architecture

DQP is a service which runs on Alation Services Manager (ASM). Please refer to the ASM Overview for more information.

The following diagram illustrates DQP’s high level physical architecture:

../../../_images/FDEdqpArch.png

The Alation administrator creates a published SQL query in Alation which is typically run on a scheduled basis. When executed by Compose this query retrieves the DQ monitoring results data from the source system and stores the result set in Alation. DQP subsequently accesses the Alation instance using APIs to read the DQ result set, on a scheduled or on-demand basis. DQP then processes the data, applying the user’s required thresholds to produce Good/Warning/Alert indicators. Finally, the prepared DQP data is made available in Alation via API.

The following diagram outlines DQP’s logical architecture together with data flows and processes:

../../../_images/FDEdqpFlow.png

Step 1

DQP supports the ingestion of DQ results from multiple sources. Currently supported source platforms are:

  • Databricks

  • Snowflake

Rules are defined by the user in the source system representing the desired data quality requirements. The rules are executed in the source system. Example rules are:

  1. Null Count Metrics - to manage missing values.

  2. Regular Expression Validation - to check email address and phone number patterns.

  3. Range Checks - to validate transaction amounts.

  4. Reference Validation - to verify product IDs.

  5. Duplicate Detection - to identify repeated values or duplicate transactions.

All rule/function creation, assignment, and execution is done within the source system using that system’s own tools, features and documentation. The source system manages its DQ monitoring execution and storage of the results.

Step 2

A published SQL query in Alation is executed by Compose to retrieve the raw DQ data from the source platform. This query is typically scheduled to run automatically on a frequency of your choosing.

The DQP engine runs, and the DQ result data is ingested into DQP. This can be run on-demand or on a schedule defined in DQP’s UI.

Step 3

The DQP engine processes the DQ data and generates the corresponding RAGs and quality scores according to the thresholds specified by the user in DQP’s UI.

Step 4

DQP pushes the generated results into Alation on a schedule you set in the DQP UI.

Where is DQP Data Shown in Alation?

Data Health Tab

DQP data appears in the Data Health tab, where users can see the rules, objects, RAG status, and the number of “out of band” occurrences detected.

../../../_images/FDEdqpDataHealth.png

Lineage

DQP data appears in lineage diagrams, allowing users to see the data quality indicators overlaid on the corresponding objects. In this example the user sees an amber warning for the Credit Decisioning table’s CustomerID. Clicking on CustomerID reveals the data contains several duplicate IDs which may be worth investigating further.

../../../_images/FDEdqpLineage.png

Table

DQP data appears in table-level pages. In this example the user sees a red alert for the FNAME column in the Customers table, indicating a data quality issue relating to missing customer first names:

../../../_images/FDEdqpTable.png

Column

Additional DQP data appears in column-level pages. In this example the user sees a red alert for the FNAME column in the Customers table. At the column-level page we see the details relating to the defined threshold rule and the DQ value responsible for the alert:

../../../_images/FDEdqpColumn.png