Configure Metadata Extraction

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Metadata Extraction (MDE) in dbt Gen2 OCF connector fetches information such as projects, models, and model columns. The connector queries your database to retrieve this metadata, which becomes catalog objects.

You can initiate MDE on demand or schedule it for regular catalog updates.

The various steps involved in configuring MDE in Alation are:

Fetch dbt Projects

Before fetching the projects for extraction, ensure that the service account has the necessary permissions to access required projects.

  1. On the Settings page of your dbt Gen2 data source, go to the Metadata Extraction tab.

  2. Click Run.

    The retrieved list of projects appears in the Projects table under the Select projects for extraction section of the Metadata Extraction page.

Note

Alation does not fetch the following projects as they are considered invalid:

  • Projects with no environments

  • Projects with unsupported data sources under the environment details. To know the list of supported data sources, see the Supported Data Sources section in Overview.

  • Project with connection type as ODBC under the environment details

Select Projects for Extraction

Alation recommends to select projects for extraction, to which you have access, instead of extracting all the projects. When selecting projects for extraction, you retrieve the metadata only for the selected ones. This makes the extraction quicker and consumes fewer resources than extracting all the projects.

By default, all the projects that Alation fetches from the data source are selected for extraction. You can adjust the selection by:

  • Selecting Projects using Filters

  • Selecting Projects Manually

If you do not select any project manually or using rules, Alation extracts all the projects upon running the metadata extraction.

Select Projects using Filters

If you want to apply extraction filters, perform these steps:

  1. On the Settings page of your dbt Gen2 OCF connector, go to the Metadata Extraction tab.

  2. Under the Select projects for extraction section, enable the Enable advanced settings toggle.

  3. Select the required extraction filter option from the Extract drop down:

    • Only selected projects — extracts metadata only from the selected projects. This is the default value.

    • All projects except selected — extracts metadata from all projects except the selected projects.

  4. Select the Keep the catalog synchronized with the current selection of projects checkbox to soft-delete the projects from previous extraction that are not part of the current project selection.

  5. Create a filter.

    1. From the first drop down, select Project.

    2. Select the filter criteria (Contains, Starts with, Ends with, Regex).

    3. Specify the keyword to look for from the project.

    Use this option if you frequently change projects or if you use extensive metadata.

    You can add multiple filters by clicking the Add another filter link.

    Note

    You must use rules if you plan to schedule MDE.

  6. Click Apply filters.

    The Projects table displays the selected projects that match the rules that you had set.

Note

After applying rules, you cannot manually adjust the selection of projects.

Select Projects Manually

If you opt to manually select the datasets for extraction, perform these steps:

  1. On the Settings page of your dbt Gen2 data source, go to the Metadata Extraction tab.

  2. Under the Select projects for extraction section, turn off the Enable advanced settings toggle if not disabled already.

  3. Select the required projects from the list of projects in the Projects table.

    Alternatively, you can select projects by searching for the required project from the table using either the project name or any keyword or string in the project name.

    After you have selected the projects, your project selection count is displayed above the Projects table.

Run Extraction

Under the Run extraction section (General Settings > Metadata Extraction), click Run Extraction to extract metadata on demand.

The status of the extraction action is logged in the Extraction Job Status table under the MDE Job History tab.

Schedule Extraction

You can also schedule the extraction. To schedule the extraction, perform these steps:

  1. On the Settings page of your dbt Gen2 data source, go to the Metadata Extraction.

  2. Under the Run extraction section, turn on the Enable extraction schedule toggle.

  3. Using the date and time widgets, select the recurrence period and day and time for the desired MDE schedule. The next metadata extraction job for your data source will run on the schedule you have specified.

    ../../../_images/Snowflake_OCF_New_ScheduleMDE.png

Note

Here are some of the recommended schedules:

  • Schedule extraction to run for every 12 hours at the 30th minute of the hour.

  • Schedule extraction to run for every 2 days at 11:30 PM.

  • Schedule extraction to run every week on the Sunday and Wednesday of the week.

  • Schedule extraction to run for every 3 months on the 15th day of the month.

Note

Discovery API usage has a retention limit of three months. Therefore, you can extract data from the previous three months.

View the MDE Job History

You can view the status of the extraction actions after you run the extraction or after Alation triggers the Metadata Extraction as per the schedule.

To view the status of extraction, go to Metadata Extraction > MDE Job History on the Settings page of your dbt Gen2 data source. The Extraction job status table is displayed.

../../../_images/dbt-mde-job-history.png

The Extraction job status table logs the following status:

  • Did Not Start - Indicates that the metadata extraction did not start due to configuration or other issues.

  • Succeeded - Indicates that the extraction was successful.

  • Partial Success - Indicates that the extraction was successful with warnings. If Alation fails to extract some of the objects during the metadata extraction process, it skips them and proceeds with the extraction process, resulting in partial success.

  • Failed - Indicates that the extraction failed with errors.

  • Skipped - Indicates that the extraction job was skipped due to a similar job being in progress.

Click the View Details link to view a detailed report of metadata extraction. If there are errors, the Job errors table displays the error category, error message, and a hint (ways to resolve the issue). Follow the instructions under the Hints column to resolve the error.

In some cases, Generate Error Report link is displayed above the Job errors table. Click the Generate Error Report link above the Job errors table to generate an archive (.zip) containing CSV files for different error categories, such as Data and Connection errors. Click Download Error Report to download the files.

After the Metadata Extraction is complete, the Catalog pages for dbt assets will display links to the respective RDBMS objects (Projects > Data Sources, Models -> RDBMS tables, Model Columns -> RDBMS columns.). Also, test data is populated under the Data Health tab of the associated RDBMS tables.