Version 2.2.9 or Newer¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Important
This section is applicable for Alation Version 2023.3.2.1 or higher and Snowflake OCF connector Version 2.2.9 or higher.
Query log ingestion (QLI) extracts and ingests the query history of a database and powers the lineage, popularity, top user, and join and filter information in the catalog. Explore examples of ingested queries on schema and table catalog pages.
The steps involved in configuring and running QLI are:
You can configure query log ingestion on the Query Log Ingestion tab of the Settings page. You can choose to configure QLI in one of the following ways:
View-Based
Custom Query-Based
Note
If you do not provide a view name or a custom query, Alation uses the default view SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
and performs the query ingestion.
Configure View-Based QLI¶
Prerequisite¶
In Snowflake, on top of the SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY view, create a view using a specific query template as shown below. Create it in any suitable schema. You can give the view any name of your choice.
Note
Snowflake is a case-sensitive database. You must use quotes around the column aliases in the QLI view query as is shown in the query example. Additionally, ensure that the QUOTED_IDENTIFIERS_IGNORE_CASE
is set to false for the Alation service account user using the following command:
ALTER USER <alation_user> SET QUOTED_IDENTIFIERS_IGNORE_CASE = FALSE
CREATE VIEW alation_QLI_view AS
SELECT
user_name as "userName",
CASE
WHEN SCHEMA_NAME IS NULL
THEN DATABASE_NAME ||'.'|| ''
ELSE DATABASE_NAME ||'.'|| SCHEMA_NAME
END AS "defaultDatabases",
TRIM(QUERY_TEXT) AS "queryString",
TRIM(SESSION_ID) AS "sessionID",
ROUND(TOTAL_ELAPSED_TIME/1000,0) AS "seconds",
false AS "cancelled",
TO_CHAR(start_time,'YYYY-MM-DD HH:MI:SS.US') AS "startTime",
TO_CHAR(start_time,'YYYY-MM-DD HH:MI:SS.US') AS "sessionStartTime"
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE execution_status = 'SUCCESS';
After creating the view, grant the Alation service account the SELECT access to this view. For more information, see the Grant Permissions for Query Log Ingestion section in Prerequisites.
Provide the QLI View Name¶
Important
The Alation user interface displays standard configuration settings for credentials and connection information stored in the Alation database. If your organization has configured Azure KeyVault or AWS Secrets Manager to hold such information, the user interface will change to include the following buttons adjacent to the respective fields:
By default, you see the user interface for Standard. In the case of Vault, instead of the actual credential information, you must select the source and provide the corresponding key. For details, see Configure Secrets for OCF Connector Settings.
On the Snowflake connector page, go to the Query Log Ingestion tab of the Settings page.
Under the Provide the QLI view name section, enter the QLI view name in the View name field.
By default, Alation queries the
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
view.Important
Use the format
schema_name.view_name or schema.view
.Click Save.
Configure Custom Query-Based QLI¶
If you cannot create the view for QLI, for example, due to access restrictions, you can use a custom QLI query to extract query history into Alation. If you opt for custom query-based QLI, Alation will directly query the system view that stores the query history every time you manually run QLI or when the QLI job runs on schedule.
For custom query-based QLI to succeed, ensure that the service account has enough permissions to select from the system view. For details, see Grant Permissions for Query Log Ingestion section in Prerequisites.
The following is an example of a custom query for QLI.
SELECT user_name AS "userName", CASE WHEN SCHEMA_NAME IS NULL THEN DATABASE_NAME || '.' || '' ELSE DATABASE_NAME || '.' || SCHEMA_NAME END AS "defaultDatabases", TRIM(QUERY_TEXT) AS "queryString", TRIM(SESSION_ID || '/' || USER_NAME) AS "sessionID", ROUND(TOTAL_ELAPSED_TIME / 1000, 0) AS "seconds", FALSE AS "cancelled", to_char( start_time, 'YYYY-MM-DD HH:MI:SS.US' ) AS "startTime", to_char( start_time, 'YYYY-MM-DD HH:MI:SS.US' ) AS "sessionStartTime" FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY WHERE START_TIME BETWEEN STARTTIME AND ENDTIME AND EXECUTION_STATUS = 'SUCCESS' AND QUERY_TEXT IS NOT NULL AND TRIM(QUERY_TEXT) != '' AND QUERY_TYPE NOT IN ( 'ALTER_SESSION', 'ALTER_SET_TAG', 'ALTER_UNSET_TAG', 'BEGIN_TRANSACTION', 'CALL', 'COMMIT', 'CREATE_STREAM', 'DESCRIBE', 'DROP_STREAM', 'GET_FILES', 'GRANT', 'LIST_FILES', 'PUT_FILES', 'REFRESH_DYNAMIC_TABLE_AT_REFRESH_VERSION', 'REMOVE_FILES', 'SHOW', 'USE' ) AND NOT ( QUERY_TEXT ILIKE 'REVOKE%' OR QUERY_TEXT ILIKE 'EXPLAIN%' OR QUERY_TEXT ILIKE 'SELECT SYSTEM$%' OR QUERY_TEXT ILIKE 'SELECT 1%' OR QUERY_TEXT ILIKE 'INSERT INTO % VALUES%' ) AND (DATABASE_NAME NOT IN ('SNOWFLAKE') OR DATABASE_NAME IS NULL) AND DATABASE_NAME = '<database_name>';
You can customize it by adding, removing, or changing the filter, but the columns and their aliases must remain as is since the connector expects this query structure.
The query should have <userName, startTime, queryString, defaultDatabases, sessionID, sessionStartTime, cancelled, seconds>
columns in a SELECT statement.
Note
Snowflake is a case-sensitive database. You must use quotes around the column aliases in the QLI view query or the custom QLI query, as is shown in the QLI query examples.
When using the QLI query example, do not substitute the
STARTTIME
andENDTIME
parameters in the WHERE filter. These parameters are not actual column names and should stay as is. They are expected by the connector and will be substituted with the start and end date of the QLI range selected in the user interface when QLI is run manually or on schedule.Substitute the placeholder
<database_name>
with a database name.
Provide the Custom Query¶
On the Snowflake data source, go to the Query Log Ingestion tab of the Settings page.
Under the Provide the QLI view name section, go to Alternatively, use a custom SQL query.
In the Custom QLI Query field, provide a custom query to retrieve the query history.
Click Save.
Test the Access and Find the Query History Size¶
Before you perform the QLI, you must validate that the service account has access to the QLI view and gauge the approximate size of the query history metadata. The size is estimated based on the average query volume of the last 7 days.
To test the access and find out the approximate size of the query history metadata, perform these steps:
On the Settings page of your Snowflake data source, go to the Query Log Ingestion tab.
Under the Test access and query history size section, click Test.
A dialog box appears displaying the access validation result and upon successful validation, the size of the query history is displayed. The size is estimated based on the query volume of the last 7 days.
Note
If the average query size for the last seven days exceeds 500k, a warning message indicates the same.
Preview Results¶
Before performing the QLI, you can preview the queries that are ingested.
On the Settings page of your Snowflake data source, click go to the Query Log Ingestion tab.
Under the Preview Results section, enter the date range for which you want to generate the preview of the query history.
Click Preview.
Click View Results to view the generated preview.
The Preview dialog appears displaying the total number of query statements per user under the User Queries tab and a detailed query statement under the Statements tab. Click Download to download the detailed query statement as a JSON file.
Note
You can use this option to run default QLI using just the date-range without using a view-based or custom query-based QLI.
Run QLI¶
You can either run QLI manually on demand or configure it to run automatically on a schedule.
Run QLI Manually¶
To perform QLI manually on demand:
On the Settings page of your Snowflake data source, go to the Query Log Ingestion tab.
Under the Run QLI section, disable the Enable QLI Schedule toggle.
Specify the desired date range using the Date Range calendar widgets. You will need to specify the start date and the end date separately.
Click Import.
A query log ingestion job is initiated.
Schedule QLI¶
On the Settings page of your Snowflake data source, go to the Query Log Ingestion tab.
Under the Run QLI section, enable the Enable QLI Schedule toggle.
Specify values for the job recurrence and time. The values are set in your local time.
Note
Here are some of the recommended schedules for better performance:
Schedule QLI to run for every 12 hours at the 30th minute of the hour
Schedule QLI to run for every 2 days at 11:30 PM
Schedule QLI to run every week on the Sunday and Wednesday of the week
Schedule QLI to run for every 3 months on the 15th day of the month
Click Import.
The next QLI runs on the set schedule.
View the Job Status¶
To view the QLI job status after you run the QLI manually or after Alation triggers the QLI as per the schedule, go to Query Log Ingestion > QLI Job Status.
The Query log ingestion job status table logs the following status:
Succeeded - Indicates that the query ingestion was successful.
Partial Success - Indicates that the query ingestion was successful with warnings. If Alation fails to ingest some of the objects during the QLI, it skips them and proceeds with the query ingestion, resulting in partial success.
Failed - Indicates that the query ingestion failed with errors.
Click the View Details link to view a detailed report of query ingestion. Click the View Details link to view a detailed report of metadata extraction. If there are errors, the Job errors table displays the error category, error message, and a hint (ways to resolve the issue). Follow the instructions under the Hints column to resolve the error.