Virtual RDBMS Data Sources¶

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Important

You are viewing documentation for Classic Alation.

This article describes how to import metadata from a CSV source file into a virtual data source created for a relational database or a Hive database.

For information on Virtual Data Source API, refer to Upload a Virtual Data Source on Alation’s Developer Portal.

Virtual RDBMS data sources include basic technical metadata. The fields available for importing are:

Name
Table type (for tables)
Column type (for columns)

Import Schemas, Tables, Columns, and Indices¶

You can import metadata in the CSV format to populate a virtual data source. The CSV file must meet several requirements. Prepare your CSV source file observing the format described below.

Important

Property values are case-sensitive. Use them as recommended. For example, Boolean properties should be in lowercase: true and false. Ensure all values are correctly formatted before uploading your CSV file.

Requirements for Head and Keys¶

The file must include the head, declaring keys and properties, and then each line in the file should consist of a specific key along with the properties that apply, with empty values for properties that do not apply.

Head¶

Sample file head: key, table_type, column_type

Note that table_type can be VIEW or TABLE.

Key¶

Alation determines the data object type based on how many dots there are in key:

schema is a schema
schema.table is a table
schema.table.column is a column
schema.table.index is recognized as index if the key is accompanied by the property index_type.

If your database type supports multipart schema (for example, this is the case for SQL Server, Amazon Redshift, and Netezza), then the key structure must be as follows:

dbname.schema is a schema
dbname.schema.table is a table
dbname.schema.table.column is a column
dbname.schema.table.index is an index if the key is accompanied by the property index_type.

For more details on key format, see Keys with Special Characters on Alation’s Developer Portal.

Sample CSV file:

"key","table_type","column_type"
"test_schema","",""
"test_schema.table1","TABLE",""
"test_schema.table2","VIEW",""
"test_schema.table1.column1","","varchar(10)"
"test_schema.table2.column1","","varchar(10)"

Add Table and View Properties¶

Along with the key, you can also add the following properties, all of them optional.

Property	Description	Example
`table_type`	Specifies the type of the table. `table_type` is optional but recommended, and can be set to `VIEW` or `TABLE`. If not specified, it is assumed to be `TABLE`.	`TABLE`
`data_location`	A URI or file path to the location of the underlying data, such as, for example, an HDFS URL for a Hive table. Use this parameter only when the `table_type` is `TABLE`.	`hdfs:///user/hive/warehouse/table_a`
`db_owner`	Name of the database account that owns this table.	`alation`
`definition_sql`	`CREATE TABLE` statement which was used to create the table. Use this parameter only when the `table_type` is `TABLE`.	`CREATE TABLE schema_a.table_a(column1int);`
`view_sql`	`CREATE VIEW` statement which was used to create the view. Use this condition; parameter only when the `table_type` is `VIEW`.	`CREATE VIEW view_name AS SELECT column1, column2 FROM table_name WHERE`
`view_sql_expanded`	`CREATE VIEW` statement with fully qualified object references specific to the Hive database type. Note that the database names should use the escaping symbols of back quote. Use this parameter only when the `table_type` is `VIEW` in a Hive database.	CREATE VIEW view_name AS SELECT `test_table`.`column1`, `test_table`.`column2` FROM `schema`.`test_table` WHERE condition;
`constraint_text`	Constraint statements enforced by the DB. Use this parameter only when the `table_type` is `TABLE`.	`column_a UNIQUE`
`ts_created`	Timestamp at which the table or view was created. Ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ	`2018-03-13T22:09:33Z`
`ts_last_altered`	Timestamp of the last ALTER statement executed against this table. Ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ	`2018-03-13T22:09:33Z`
`partitioning_attributes`	An array of columns	`['column1', 'column2']`
`bucket_attributes`	An array of columns used to bucket the table.	`['column1', 'column2']`
`sort_attributes`	An array of columns used to sort the table.	`['column1', 'column2']`
`synonyms`	An array of other names that can be used to refer to this table. Each synonym is represented as a JSON comprising a schema_name and table_name.	`[{'schema_name':'schema_a', 'table_name':'table_a'}, {'schema_name':'schema_b', 'table_name': 'table_b'}]`
`skews_info`	A JSON of the skew column names to an array of their respective skewed column values that appear often.	`{'column1': ['column1_value1', 'column1_value2'], 'column2': ['column2_value1', 'column2_value2']}`
`table_comment`	A comment field that stores a description of the table which is ingested from the source system.	`Created by DB`

Note

You can have newline in the text values of the fields by adding a newline wherever needed. For example, newline can be inserted after AS:

CREATE VIEW view_name AS
SELECT column1, column2 FROM table_name WHERE condition;

For non-primitive data types, single quotes must be used to enclose the string in the data type. For example: ['column1', 'column2'].

Add Column Properties¶

For columns, you can specify the following properties, all of them optional:

Property	Description	Example
`column_type`	This property is optional but recommended and can be any string.	`map<int, string>`
`position`	Position of the column in the table which has it. This value needs to be a positive integer. If unspecified, the value defaults to the order in which it was uploaded.	`5`
`column_comment`	A comment field that stores a description of the column which is ingested from the source system.	`<has a default value>`
`nullable`	Field to indicate if the column can be nullable. Set this to `true` if the column is a nullable field, to `false` if otherwise. Default Value: `true`.	`true`

Add Indices¶

For indices, you can specify the following properties:

Property	Required	Description
`index_type`	yes The presence of this field identifies an index object in a column.	This property is an enum. The value for this property can be one of: `PRIMARY` `SECONDARY` `PARTITIONED_PRIMARY` `UNIQUE` `OTHER` For details, see Understand the Index Key Types and Icons When specifying an index, ensure that the corresponding table is already a part of the database metadata. Even for index upsert, this field is required. Example: Use `PRIMARY` index type to define a primary key.
`column_names`	yes	An array of column names on which the index is defined. If the index is composite, this array will have multiple column names. This cannot be an empty array. When specifying an index, make sure the columns it corresponds to are already a part of the database metadata. For index upsert, this field can be optional. The order in which the column names are specified is important because this implies the sequencing of the column forf composite indices. Example: `["column1"]`
`data_structure`	no	The underlying data structure used by the index. The value for this field can be one of: `BTREE` `HASH` `BITMAP` `DENSE` `SPARSE` `REVERSE` `OTHER` `NONE` Default: `NONE`
`index_type_detail`	no	A string having custom detailed information about the index. Example: `MULTI_COLUMN_STATISTICS`
`is_ascending`	no	Set this boolean to `true` if the index is created in ascending order, otherwise set `false`. This is not valid for the composite index.
`filter_condition`	no	Filter condition used while creating an index for part of the rows in the table. This is not valid for composite index. Example: `([filteredIndexCol]>(0))`
`is_foreign_key`	no	Set this boolean to `true` if the index is a foreign key. This allows Alation to recognize the index as a foreign key. When this is set to `true`, it is mandatory to provide the values for `foreign_key_table_name` and `foreign_key_column_names` fields.
`foreign_key_table_name`	yes if `is_foreign_key` is set to `true`	The key of the parent table object which the foreign index refers to. This is required only if `is_foreign_key` is set to `true`. Make sure the table it corresponds to is already a part of the database metadata. Example: `7.schema_a.table_a`
`foreign_key_column_names`	yes if `is_foreign_key` is set to `true`	An array of column names on the parent table object which the foreign index refers to. This is required only if `is_foreign_key` is set to `true`. Make sure the columns it corresponds to are already a part of the database metadata. The number of columns here should match the number of columns in `column_names` field. Example: `['column1']`

Sample CSV file:

"key","table_type","column_type","index_type","column_names"
"public","","","",""
"public.customers","VIEW","","",""
"public.parts","TABLE","","",""
"public.parts.id","","varchar(40)","",""
"public.parts.index","","","SECONDARY","['id']"

Understand the Index Key Types and Icons¶

You can use one of the following index types (index_type):

PRIMARY: Ensures the uniqueness of the primary key in a table.

SECONDARY: Specifies additional indices to speed up queries related to non-primary keys.

PARTITIONED_PRIMARY: Indicates a partitioned primary index for efficiently managing large tables.

UNIQUE: Enforces the uniqueness of values in a column or set of columns.

OTHER: Indicates custom or context-specific index types specific to certain database systems or applications.

Based on the value you set for the index_type property, one of the following icons are displayed on catalog pages next to Columns:

Icon	Description
	Primary key icon. This icon is displayed when `index_type` is set to `PRIMARY`. There will be only one primary key icon for a table.
	Foreign key icon. This icon is displayed when `index_type` is set to `SECONDARY`, `PARTITIONED_PRIMARY`, `UNIQUE`, or `OTHER` and `is_foreign_key` is set to `true`.
	Index key icon. This icon is displayed when `index_type` is set to any value other than `PRIMARY` (`SECONDARY`, `PARTITIONED_PRIMARY`, `UNIQUE`, or `OTHER`) and `is_foreign_key` is set to `false`.

Encoding and Special Characters¶

If the data has Unicode characters, make sure the file is encoded in UTF8. Alation will show a preview before you confirm importing.
If a property in the head does not apply to the key, leave it empty.
Alation does not support symbols ' " [ ] in the names loaded using CSV. If you must have these symbols in the names, you will have to use a complex escaping sequence.

Reserved Words¶

The following words are reserved and can’t be used anywhere in the CSV file, even in the middle of a string. The words are case-insensitive. If you use any of these words, the import will fail with the error message “Content scan failed for file {filename}.csv!”.

onafterprint
onbeforeprint
onbeforeunload
onerror
onhashchange
onload
onmessage
onoffline
ononline
onpagehide
onpageshow
onpopstate
onresize
onstorage
onunload
onblur
onchange
oncontextmenu
onfocus
oninput
oninvalid
onreset
onsearch
onselect
onsubmit
onkeydown
onkeypress
onkeyup
onclick
ondblclick
onmousedown
onmousemove
onmouseout
onmouseover
onmouseup
onmousewheel
onwheel
ondrag
ondragend
ondragenter
ondragleave
ondragover
ondragstart
ondrop
onscroll
oncopy
oncut
onpaste
onabort
oncanplay
oncanplaythrough
oncuechange
ondurationchange
onemptied
onended
onerror
onloadeddata
onloadedmetadata
onloadstart
onpause
onplay
onplaying
onprogress
onratechange
onseeked
onseeking
onstalled
onsuspend
ontimeupdate
onvolumechange
onwaiting
ontoggle

Add, Update, and Delete Properties per Technical Metadata Object¶

Initially, you can upload only some of the properties for a key and later update the same key with new properties or update or delete an existing property value. For example, you can do one upload to add view_sql to all VIEWs and another upload to add ts_created. This second upload will retain the previously added view_sql if you omit that column.

Add Properties¶

Let us discuss examples of importing a TABLE, COLUMN and INDEX objects and their properties. Suppose, the first import uploads only some of the properties as illustrated in the Sample CSV below:

Sample CSV:

"key","table_type","column_type","db_owner","definition_sql","constraint_text","ts_created","ts_last_altered","partitioning_attributes","synonyms","skews_info","table_comment","index_type","column_names","index_type_detail","is_ascending
"public","","","","","","","","","","","","","","",""
"public.parts","TABLE","","alation","create table schema_a.table_a(column1 int);","column_a UNIQUE","2018-03-13T22:09:33Z","2018-03-13T22:09:33Z","['column1', 'column2']","[{'schema_name': 'schema_a','table_name':'table_a'},{'schema_name': 'schema_b','table_name':'table_b'}]","{'column1': ['column1_value1', 'column1_value2'],'column2': ['column2_value1', 'column2_value2']}","Created by DB","","","","",""
"public.parts.id","","int","","","","","","","","","","","","","",""
"public.parts.index","","","","","","","","","","","","SECONDARY","['id']","BTREE","MULTI_COLUMN_STATISTICS","true"

You can specify the following properties individually:

partitioning_attributes
bucket_attributes
sort_attributes
synonyms

Sample CSV:

"key","table_type","bucket_attribute1","bucket_attribute2","sort_attribute1","partitioning_attribute1","synonym1","synonym2"
"public.parts","TABLE","bucket_column1","bucket_column2","sort_column1","partition_column1","{'schema_name':'schema_a','table_name':'table_a'}","{'schema_name': 'schema_b','table_name': 'table_b'}"

Update Properties of an Existing Object¶

While creating the above TABLE object, we have not added the data_location property. We will next add that property and update the value of the existing property db_owner. For the COLUMN object, we can also add position and update column_type. Similarly, for an INDEX, we can add filter_condition and update the existing property data_structure.

Sample update CSV:

"key","table_type","db_owner","data_location","position","index_type","filter_condition","data_structure"
"public.parts","TABLE","new_owner","hdfs:///user/hive/warehouse/parts","","","",""
"public.parts.id","","","","5","","",""
"public.parts.index","","","","","SECONDARY","([filteredIndexCol]>(0))","HASH"

All the properties mentioned while adding the object for the first time are retained. index_type and is_foreign_key cannot be changed after the index is created. foreign_key_table_name and foreign_key_column_names are required if is_foreign_key is part of the request. However, is_foreign_key can be skipped to retain the existing foreign key reference values.

Delete Properties of an Existing Object¶

For already uploaded properties of an object, you can delete the properties by uploading the same object with the property value set to null.

However, table_type, index_type, column_names, is_foreign_key, foreign_key_table_name, and foreign_key_column_names property values cannot be deleted.

Sample CSV:

"key","table_type","db_owner","data_location","position","index_type","data_structure","filter_condition"
"public.parts","","","","","",""
"public.parts.id","","","","","","",""
"public.parts.index","","","","","SECONDARY","",""

This upload removes the data_location and db_owner properties for the TABLE object, removes column_type and position properties for the COLUMN object, removes data_structure and filter_condition for the INDEX object.

Upload the CSV File¶

To import metadata from a CSV file:

On the catalog page of the NoSQL data source, on the upper right, click Settings to open the data source settings page.
Cick the Import Metadata tab.
Upload or drag and drop the source CSV file into the upload area.
Review the uploaded file. Any formatting errors will appear in the Error column of the preview table.
In the top left corner of the uploaded table, click Confirm to confirm upload.
Refresh Job History. When the import is completed, the corresponding completed metadata extraction (MDE) job will be listed in the Job History table.
Navigate to the catalog page of the virtual data source to view the imported metadata.

Virtual RDBMS Data Sources¶

Import Schemas, Tables, Columns, and Indices¶

Requirements for Head and Keys¶

Head¶

Key¶

Add Table and View Properties¶

Add Column Properties¶

Add Indices¶

Understand the Index Key Types and Icons¶

Encoding and Special Characters¶

Reserved Words¶

Add, Update, and Delete Properties per Technical Metadata Object¶

Add Properties¶

Update Properties of an Existing Object¶

Delete Properties of an Existing Object¶

Upload the CSV File¶

Alation User Documentation PDFs