Amazon DynamoDB OCF Connector: Install and Configure¶

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Core Connector Core connectors are included with all Alation platform tiers (subject to each tier’s connector limits) and are fully supported by Alation.

Network Connectivity¶

Open outbound TCP port 8000 to the DynamoDB server.

Service Account¶

You can choose from the given auth scheme and authenticate with the Service. For example: If you choose to authenticate using IAMRole, you can just pass your AccessKey, SecretKey, and ARN.

Permissions for Metadata Extraction¶

Use the predefined roles for Amazon DynamoDB:

AmazonDynamoDBReadOnlyAccess: Grants read-only access to DynamoDB resources through the AWS Management Console.

AmazonDynamoDBFullAccess: Grants full access to DynamoDB resources through the AWS Management Console.

Refer to the table below for minimum permissions required for IAM Role:

IAM Role

Description
dynamodb:ListTables
Required for getting a list of your DynamoDB tables. Used during metadata retrieval to dynamically determine the list of your tables. Note that this action does not support resource-level permissions and requires you to choose All resources (hence the * for “Resource”). In other words, the action dynamodb:ListTables needs a * Resource, and the other actions can be given permission to all the tables arn:aws:dynamodb:us-east-1:987654321098:table/* or to a list of tables:
"Resource": [

      "arn:aws:dynamodb:<region>:<account-id>:table/<table-name1>",
      "arn:aws:dynamodb:<region>:<account-id>:table/<table-name2>"
]
dynamodb:DescribeTable
Required for getting metadata about the selected table. Used during table metadata retrieval to dynamically determine the list of the columns. This action supports resource-level permissions, so you can specify the tables you want to get the metadata from. For example, for the table Customers and Orders in the region Northern Virginia us-east-1, for account 987654321098:
{
    "Effect": "Allow",
    "Action": [

        "dynamodb:DescribeTable"
    ],
    "Resource": [

         "arn:aws:dynamodb:<region>:<account-id>:table/<table-name1>",
         "arn:aws:dynamodb:<region>:<account-id>:table/<table-name2>"
    ]
 }
To give permissions to all the tables in the region you specified in the connection property AWSRegion, use an * instead of the table name: "Resource": "arn:aws:<region>:<account-id>:table/*"

Permissions for Sampling and Profiling¶

Use the predefined roles for Amazon DynamoDB:

AmazonDynamoDBReadOnlyAccess: Grants read-only access to DynamoDB resources through the AWS Management Console.

AmazonDynamoDBFullAccess: Grants full access to DynamoDB resources through the AWS Management Console.

Refer to the table below for minimum permissions required for IAM Role:

IAM Role

Description

dynamodb:Scan

Required for getting one or more items by accessing every item in the table. Used for most of the SELECT queries, for example, SELECT * FROM [Customers].

This action supports resource-level permissions, so you can specify the tables you want to get data from, similar to dynamodb:DescribeTable.

dynamodb:PartiQLSelect

Required for getting specific items from a table when using SELECT queries and filtering by the primary key column, for example, SELECT * FROM [Customers] WHERE id=1234.

This action supports resource-level permissions, so you can specify the tables you want to get data from, similar to dynamodb:DescribeTable.

Authentication¶

Basic Authentication¶

You’ll need the following credentials for basic authentication:

Username of the service account

Password of the service account

AWS Authentication¶

Select one of the following values in the Auth Scheme on the General Settings page to enable AWS authentication:

AWS Root Keys (Default)

AWS IAM Roles

AWS EC2 Roles

AWS MFA

AWS Cognito SRP

AWS Cognito Basic

SSL Authentication¶

Note

The SSL authentication is not supported in this release. However, you can specify the SSL certificate content for an SSL handshake if the cluster on DynamoDB is using SSL.

Select Use SSL in the Auth Scheme on the General Settings page to enable this.

You’ll need the following credentials and resources for authentication:

Username of the service account

Password of the service account

The SSL certificate file

SSO Authentication¶

Note

The SSO authentication is not supported in this release.

Select the Identity Provider (IdP) OKTA, ADFS, or PingFederate in SSO > Auth Scheme on the General Settings page to enable this.

You’ll need the following credentials and resources for authentication:

Username of the IdP

Password of the IdP

SSO Login URL

SSO Exchange URL

JDBC URI¶

Provide the JDBC URI in the JDBC URI field of Connector Settings section of General Settings tab.

Use the following format to build the JDBC URI:

Format:¶

amazondynamodb://AWSAccessKey=xxx;AWSSecretKey=xxx;Domain=amazonaws.com;AWS Region=OREGON;

Note

You can provide the parameter values in the individual fields in the Connector Settings section (preferred). However, the value provided in the JDBC URI field takes precedence over the same parameters configured in the corresponding field in the Connector Settings section.

Install the Connector¶

Alation On-Premise¶

Important

Installation of OCF connectors requires Alation Connector Manager to be installed as a prerequisite.

To install an OCF connector:

If this has not been done on your instance, install the Alation Connector Manager: Install Alation Connector Manager.
Ensure that the OCF connector Zip file is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.

Alation Cloud Service¶

Note

On Alation Cloud Service instances, Alation Connector Manager is available by default.

Depending on your network configuration, you may need to use Alation Agent to connect to databases.

Connection via Alation Agent¶

Ensure that Alation Agent is enabled on your Alation instance. If necessary, create a Support ticket with Alation for an Alation representative to enable the Alation Agent feature on your instance and to receive the Alation Agent installer.
Install the Alation Agent.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.

Connection Without Agent¶

To install an OCF connector:

Ensure that the OCF connector Zip file is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.

Create and Configure a New Data Source¶

In Alation, add a new data source:

Log in to Alation as a Server Admin.
Expand the Apps menu on the right of the main toolbar and select Sources.
On the Sources page, click +Add on the top right of the page and in the list that opens, click Data Source. This will open the Add a Data Source wizard.
On the first screen of the wizard, specify a name for your data source, assign additional Data Source Admins, if necessary, and click the Continue Setup button on the bottom. The Add a Data Source screen will open.
On the Add a Data Source screen, the only field you should populate is Database Type. From the Database Type dropdown, select the connector name. After that you will be navigated to the Settings page of your new data source.

Note

Agent-based connectors will have the Agent name appended to the connector name.

The name of this connector is Alation OCF Connector for DynamoDB.

Configuration¶

Access¶

On the Access tab, set the data source visibility using these options:

Public Data Source—The data source will be visible to all users of the catalog.

Private Data Source—The data source will be visible to the users allowed access to the data source by Data Source Admins.

You can add new Data Source Admin users in the Data Source Admins section.

General Settings¶

Note

The automatic lineage is not supported in this release.

Specify Application Settings if applicable. Click Save to save the changes after providing the information.

Parameter

Description

BI Connection Info

This parameter is used to generate lineage between the current data source and another source in the catalog, for example a BI source that retrieves data from the underlying . database

The parameter accepts host and port information of the corresponding BI data source connection.

Use the following format: host:port

You can provide multiple values as a comma-separated list:

10.13.71.216:1541,sever.com:1542

Find more details in BI Connection Info

Enable Automatic Lineage Generation

The Enable Automatic Lineage Generation toggle is ON by default, which means Alation generates lineage automatically during metadata extraction (MDE). You only need to use the toggle if you want to disable automatic lineage generation or re-enable it later.

To disable or enable automatic lineage generation:

Open your data source and go to Settings > General Settings.

Use the Enable Automatic Lineage Generation toggle:

Off: lineage is not generated automatically. You can use manual methods or the API to create lineage.

On: lineage is generated automatically during MDE.

Connector Settings¶

Populate the data source connection information and Save the values.

Data Source Connection¶

Parameter

Description

JDBC URI

Specify the JDBC URI in the required format.

Connection¶

Parameter

Description

Use Lake Formation

Select this to retrieve temporary credentials that enforces access policies against the user based on the configured IAM role. The service can be used when authenticating through OKTA, ADFS, AzureAD, PingFederate, while providing a SAML assertion.

AWS Authentication¶

Parameter

Description

Auth Scheme

Specify an authentication scheme that DynamoDB uses to authenticate the connection.

Available Values:

Auto: Set this to have the provider attempt to automatically resolve the proper authentication scheme to use based on the other connection properties specified.

TemporaryCredentials: Set this to leverage temporary security credentials alongside a session token to connect.

AwsRootKeys: (Default) Set this to use the root user access key and secret. Useful for quickly testing, but production use cases are encouraged to use something with narrowed permissions.

AwsIAMRoles: Set to use IAM Roles for the connection.

AwsEC2Roles: Set this to automatically use IAM Roles assigned to the EC2 machine the CData Alation OCF Connector for Amazon DynamoDB is currently running on.

AwsMFA: Set to use multi factor authentication.

Okta: Set to use a single sign on (SSO) connection with OKTA as the identity provider.

ADFS: Set to use a single sign on (SSO) connection with ADFS as the identity provider.

PingFederate: Set to use a single sign on (SSO) connection with PingFederate as the identity provider.

AwsCredentialsFile: Set to use a credential file for authentication.

AwsCognitoSrp: Set to use Cognito based authentication. This is recommended over AwsCognitoBasic because this option does NOT send the password to the server for authentication, instead it uses the SRP protocol.

AwsCognitoBasic: Set to use Cognito based authentication.

Domain

Specify your AWS domain name. You can optionally choose to associate the domain name with AWS.

Default: amazonaws.com

AWS Access Key

Specify your AWS account access key. This value is accessible from your AWS security credentials page.

AWS Secret Key

Specify your AWS account secret key. This value is accessible from your AWS security credentials page.

AWS Role ARN

Specify the Amazon Resource Name of the role to use when authenticating for a role based authentication.

Requires AWS Access Key and AWS Secret Key of an IAM user.

AWS Region

Specify the hosting region for your Amazon Web Services.

Default: NORTHERNVIRGINIA

Available Values:

OHIO, NORTHERNVIRGINIA, NORTHERNCALIFORNIA, OREGON, CAPETOWN, HONGKONG, JAKARTA, MUMBAI, OSAKA, SEOUL, SINGAPORE, SYDNEY, TOKYO, CENTRAL, BEIJING, NINGXIA, FRANKFURT, IRELAND, LONDON, MILAN, PARIS, STOCKHOLM, ZURICH, BAHRAIN, UAE, SAOPAULO, GOVCLOUDEAST, GOVCLOUDWEST

AWS Credentials File

Select the path to the AWS Credentials File to be used for authentication.

Note

AWS Credentials File is not supported in this release.

For more information, see Configure Files.

AWS Credentials File Profile

Specify the name of the profile to be used from the supplied AWS Credentials File.

Note

AWS Credentials File is not supported in this release.

AWS Session Token

Specify your AWS session token.

AWS External ID

Specify a unique identifier that is required when you assume a role in another account.

MFA Serial Number

Specify the MFA Serial Number.

You can find the device for an IAM user on the AWS Management Console and view the user’s security credentials.

For virtual devices, this is Amazon Resource Name, example: arn:aws:iam::123456789012:mfa/user

MFA Token

Specify the temporary token available on your MFA device.

This is used with the MFA Serial Number to retrieve temporary credentials for login. The temporary credentials available from AWS only lasts up to one hour by default.

Temporary Token Duration

Specify the time duration (in seconds) for a temporary token to last.

Default: 3600

For MFA connection, a new MFA Token must be specified in the connection to retrieve a new temporary token.

For Role based authentication, the minimum duration is 900 seconds (15 minutes) while the maximum is 3600 (1 hour) even if MFA is used.

For MFA authentication (using an IAM user or root), the minimum is 900 seconds (15 minutes), the maximum is 129600 (36 hours).

AWS Cognito Region

Specify the hosting region for AWS Cognito.

Default:

Available Values:

NORTHERNVIRGINIA, OHIO, NORTHERNVIRGINIA, NORTHERNCALIFORNIA, OREGON, CAPETOWN, HONGKONG, MUMBAI, OSAKA, SEOUL, SINGAPORE, SYDNEY, TOKYO, CENTRAL, BEIJING, NINGXIA, FRANKFURT, IRELAND, LONDON, MILAN, PARIS, STOCKHOLM, BAHRAIN, SAOPAULO, GOVCLOUDEAST, GOVCLOUDWEST

AWS User Pool Id

Specify the User Pool Id from AWS Cognito > Manage User Pools > select your user pool > General settings > Pool Id.

AWS User Pool Client App Id

Specify the User Pool Client App Id from AWS Cognito > Manage Identity Pools > select your user pool* > General settings > App clients > App client Id.

AWS User Pool Client App Secret

(Optional) Specify the User Pool Client App Secret from AWS Cognito > Manage Identity Pools > select your user pool > General settings > App clients > App client secret.

AWS Identity Pool Id

Specify the Identity Pool Id from AWS Cognito > Manage Identity Pools > select your identity pool > Edit identity pool > Identity Pool Id.

SSO¶

Note

The SSO authentication is not supported in this release.

Parameter

Description

User

Specify the IdP user name to authenticate the IdP (as set in the Auth Scheme) via SSO.

Password

Specify the password to authenticate the IdP (as set in the Auth Scheme) user via SSO.

SSO Login URL

Specify the identity provider’s (as set in the Auth Scheme) login URL.

SSO Properties

Specify the additional properties required to connect to the identity provider in a semicolon-separated list.

You can set and use OKTA, ADFS, or PingFederate in the Auth Scheme as identity provider for SSO.

ADFS example:

AuthScheme=ADFS; AWSRegion=Ireland; User=user@cdata.com; Password=CH8WerW121235647iCa6; SSOLoginURL=’https://adfs.domain.com’; AWSRoleArn=arn:aws:iam::1234:role/ADFS_SSO; AWSPrincipalArn=arn:aws:iam::1234:saml-provider/ADFSProvider; S3StagingDirectory=s3://athena/staging;

OKTA example:

AuthScheme=Okta; AWSRegion=Ireland; User=user@cdata.com; Password=CH8WerW121235647iCa6; SSOLoginURL=’https://cdata-us.okta.com/home/amazon_aws/0oa35m8arsAL5f5NrE6NdA356/272’; SSOProperties=’ApiToken=01230GGG2ceAnm_tPAf4MhiMELXZ0L0N1pAYrO1VR-hGQSf;’; AWSRoleArn=arn:aws:iam::1234:role/Okta_SSO; AWSPrincipalARN=arn:aws:iam::1234:saml-provider/OktaProvider; S3StagingDirectory=s3://athena/staging;

SSO Exchange Url

Specify the URL used for consuming the SAML response and exchanging it for service specific credentials.

SSL¶

Note

The SSL authentication is not supported in this release. However, you can specify the SSL certificate content for an SSL handshake if the cluster on DynamoDB is using SSL.

Parameter

Description

SSL Server Cert

Specify the TLS/SSL certificate to be accepted from the server.

Accepted Values:

A full PEM Certificate

A path to a local file containing the certificate

The public key

The MD5 Thumbprint (hex values can also be either space or colon separated)

The SHA1 Thumbprint (hex values can also be either space or colon separated)

If not specified, any certificate trusted by the machine is accepted.

Certificates are validated as trusted by the machine based on the system’s trust store. The trust store used is the ‘javax.net.ssl.trustStore’ value specified for the system.

If no value is specified for this property, Java’s default trust store is used

(for example, JAVA_HOME\lib\security\cacerts).

Use * to signify to accept all certificates. This is not recommended due to security concerns.

Firewall¶

Parameter

Description

Firewall Type

Specify the protocol used by the proxy-based firewall for traffic tunneling.

Available Options:

NONE: Default.

TUNNEL: Opens a connection to DynamoDB and traffic flows back and forth through the proxy. The default port is 80.

SOCKS4: Sends data through the SOCKSv4 proxy as specified in Firewall Server and Firewall Port. The default port is 1080.

SOCKS5: Sends data through the SOCKSv5 proxy as specified in Firewall Server and Firewall Port. The default port is 1080.

Firewall Server

Specify the host name, DNS name, or IP address of the proxy-based firewall.

Firewall Port

Specify the TCP port of the proxy-based firewall.

Firewall User

Specify the user name to authenticate with the proxy-based firewall.

Firewall Password

Specify the password to authenticate with the proxy-based firewall.

Proxy¶

Parameter	Description
Proxy Auto Detect	Select this to use the system proxy settings. Don’t select this if you’re using custom proxy settings. For SOCKS proxy, select the appropriate value in Firewall Type.
Proxy Server	Specify the hostname or IP address of a proxy to route HTTP traffic. For SOCKS proxy, select the appropriate value in Firewall Type.
Proxy Port	Specify the TCP port the ProxyServer proxy is running on. Default: 80
Proxy Auth Scheme	Specify the authentication type to use to authenticate to the proxy server. Available Values: BASIC: (Default) Enables HTTP basic authentication. DIGEST: Enables HTTP digest authentication. NONE: No proxy authentication. NEGOTIATE: Retrieves an NTLM or Kerberos token based on the applicable protocol for authentication. NTLM: Retrieves only NTLM token based on the applicable protocol for authentication. PROPRIETARY: Adds a custom token in the Authorization header of the HTTP request. It doesn’t generate NTLM or Kerberos token.
Proxy User	Specify the username to authenticate to the proxy server based on the chosen Proxy Auth Scheme. If you are using Windows or Kerberos authentication, set this property to a user name in one of the following formats: `user@domain` `domain\user`
Proxy Password	Specify the password to authenticate to the proxy server based on the chosen Proxy Auth Scheme.
Proxy SSL Type	Select the SSL type when connecting to the proxy server. Available Values: AUTO: (Default) If the URL is an HTTPS URL, the provider will use the TUNNEL option. If the URL is an HTTP URL, the component will use the NEVER option. ALWAYS: The connection is always SSL enabled. NEVER: The connection is not SSL enabled. TUNNEL: The connection is established through a tunneling proxy. The proxy server opens a connection to the remote host and traffic flows through the proxy.
Proxy Exceptions	Specify a semicolon separated list of destination hostnames or IPs that are exempt from connecting through the proxy server.

Logging¶

Parameter

Description

Verbosity

Specify the verbosity level between 1 to 5 to include details in the log file.

Log Modules

Includes the core modules in the log files. Add module names separated by a semi-colon.

By default, all modules are included.

Max Log File Count

Specify the maximum file count for log files.

After the limit, the log file is rolled over and time is appended at the end of the file. The oldest log file is deleted.

Maximum Value: 2

Default: -1. A negative or zero value indicates unlimited files.

Schema¶

Parameter

Description

Browsable Tables

Specify the schemas as subset of the available schemas in a comma separated list. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC.

Tables

Specify the fully qualified name of the table as a subset of the available tables in a comma separated list. For example, Tables=TableA,TableB,TableC.

Each table must be a valid SQL identifier that might contain special characters escaped using square brackets, double-quotes, or backticks. For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`.

Views

Specify the fully qualified name of the Views as a subset of the available tables in a comma separated list. For example, Views=ViewA,ViewB,ViewC.

Each table must be a valid SQL identifier that might contain special characters escaped using square brackets, double-quotes, or backticks. For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`.

Misc¶

Parameter	Description
Auto Detect Index	Select this to automatically detect the secondary index based on the query used.
Batch Size	Specify the maximum size of each batch operation. Default: 0
Connection Life Time	Specify the maximum limit for a connection to stay connected in seconds. Default: 0 indicates unlimited lifetime for a connection.
Flatten Arrays	Specify an arbitrary number to flatten the elements in a nested array into columns. By default, the nested arrays are returned as JSON strings. Set it to “-1” to flatten all the elements.
Flatten Objects	Select this to flatten the object properties in a nested array into columns. By default, the nested arrays are returned as JSON strings.
Flexible Schema	Set FlexibleSchema to true to scan for additional metadata on the query result set. Otherwise, the metadata will remain the same.
Generate Schema Files	Specify the preference when to generate and save the schemas. Available Values: Never: Doesn’t generate a schema file. OnUse: A schema file is generated the first time a table is referenced, provided the schema file for the table does not already exist. In SQL, the schemas are generated as you execute SELECT queries. OnStart: A schema file is generated at connection time for any tables that do not currently have a schema file. OnCreate: A schema file is generated when running a CREATE TABLE SQL query.
Ignore Types	Specify to remove support for the specified types. For example: Time. These types will then be reported as strings instead. Default: Datetime,Date,Time
Maximum Request Retries	Specify the maximum number of times to retry a request. Default: 4
Max Rows	Specify the limit for the number of rows returned if no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses.
Other	Specify the caching, integration, or formatting properties in a list format separated by a semicolon. Available Options: Caching Configuration: CachePartial=True: Caches only a subset of columns specified in the query. QueryPassthrough=True: Passes the specified query to the cache database instead of using the SQL parser of the provider. Integration and Formatting: DefaultColumnSize: Sets the default length of string fields when the data source does not provide column length in the metadata. The default value is 2000. ConvertDateTimeToGMT: Converts date-time values to GMT instead of the local time of the machine. RecordToFile=filename: Records the underlying socket data transfer to a specified file.
Page size	Specify the maximum number of results to return per page from Amazon DynamoDB. A higher value results in better performance but uses more memory.
Pool Idle Timeout	Specify the idle time for a connection in a pool. Default: 60 seconds
Pool Max Size	Specify the maximum number for connections in a pool. To disable, set the value to 0 or less. Default: 100
Pool Min Size	Specify the minimum number for connections in a pool. Default: 1
Pool Wait Time	Specify the maximum wait duration for a connection to become available. If a new connection request is in wait for an available connection but exceeds the time, an error is thrown. By default, new connection requests have a forever wait time for an available connection. Default: 60 seconds
Pseudo Columns	Specify the pseudo columns in the comma-separated list to be added as columns to the table. For example, `"Table1=Column1, Table1=Column2, Table2=Column3"`. Use the `` character to include all tables and columns in this format: `=*`
Read only	Select this to enforce only SELECT queries to work on Amazon DynamoDB.
Retry Wait Time	Specify the minimum number of milliseconds the provider needs to wait to retry a request. Default: 2000
Row Scan Depth	Specify the maximum number of rows to scan for the available columns in a table. Set it to `-1` to scan an arbitrary number of rows.
Separator Character	Specify the character or characters to denote hierarchy or separate columns. Default: `.` Note If your data has columns that use a period `.` within the attribute name, specify any other character.
Thread Count	Specify the number of threads to use when selecting data via a parallel scan. To disable parallel scans, set this to `1`.
Timeout	Specify the time limit in seconds after which the operation is canceled and an error is thrown. A value of 0 specifies that the operation never times out until completion or failure. Default: 60 seconds
Type Detection Scheme	Specify how to scan data to determine the fields and datatypes in a document collection. Available Values: None: Returns all columns as strings. Rowscan: Scans rows to heuristically determine the data type. Recent: Scans the rows to heuristically determine the data type for the recent documents in a collection.
Use Batch Write Item Operation	Select this to handle UPDATE or INSERT queries when inserting, updating binary, or binary-set data.
Use Connection Pooling	Select this to enable connection pooling.
Use Consistent Reads	Select this to always use Consistent Reads when querying Amazon DynamoDB.
User Defined Views	Specify the file path pointing to the JSON configuration file that contains custom views.
Use Simple Names	Select this to determine if simple names should be used for tables and columns.

Obfuscate Literals¶

Obfuscate Literals—Enable this toggle to hide actual values in the query statements that are ingested during query log ingestion or executed in Compose. This toggle is disabled by default.

Test Connection¶

Under Test Connection, click Test to validate network connectivity.

If the connection test fails, make sure the JDBC URI and service account credentials are correct.

Logging Configuration¶

Select the logging level for the connector logs and save the values by clicking Save in this section. The available log levels are based on the Log4j framework.

Parameter	Description
Log level	Select the log level to generate logs. The available options are INFO, DEBUG, WARN, TRACE, ERROR, FATAL, ALL.

Delete the Data Source¶

To delete the Data Source, refer Delete a Data Source.

Metadata Extraction¶

You can configure metadata extraction (MDE) for an OCF data source on the Metadata Extraction tab of the Settings page. Refer to Configure Metadata Extraction for OCF Data Sources for information about the available configuration options.

This connector supports default query-based MDE. Custom query-based extraction is not supported.

Compose¶

Not supported.

Sampling and Profiling¶

Sampling and profiling is supported. For details, see Configure Sampling and Profiling for OCF Data Sources.

Troubleshooting¶

Refer to Troubleshoot OCF Connectors for information about logs.

General Troubleshooting Tips¶

Verify the Driver Version Being Used¶

The driver version being used can be obtained from the log file. It is printed out whenever a connection is opened and is in the form:

Open Snowflake connection. Version: 20.0.7514.0. Edition: [JDBC Driver].

The version will help identify if there have been any issues that have been resolved in later builds in which case the customer will need to be updated. Identifying the version also ensures that any testing performed in house is done using the same build.

Identify Whether the Issue Stems from Product or Application or the Driver¶

Reviewing the error message should help you identify the source of the problem (For example, an error message you are familiar with or know stems from your product). Viewing the stack trace for the exception will help identify whether it has the CData class in it. In this case, the error is most likely coming from the driver.

Another useful tip is to verify the query being pushed down to the driver. Depending on the issue, you can distinguish whether the problem is specific to the product or the driver.

Identify Whether the Issue Stems from the Driver or the Data Source¶

One of the best ways to determine this is to look at the debug log for the driver. The log will display the order of operations performed (such as retrieving metadata, executing queries, etc.) along with the requests and responses sent to the server. The error message will be displayed in the log as well and will help identify at what point, call, or request the error occurred.

The request and response sequence can be reviewed to identify if there are any issues. The request can be verified to ensure that the format and data is correct based on the query being issued. The request itself will vary between drivers, so reviewing the data source specifications may help identify issues. If any issues are identified, such as a malformed request, this should be escalated to the CData team to investigate further.

The response can also be reviewed as it may contain additional information on errors (one good example are the SharePoint drivers which may just return an HTTP response that needs to be looked at). The response can also be reviewed for the data returned. This is useful for cases where customers see issues where data is not being returned when they expect it to. In these cases, the response can be reviewed to identify if the response from the server contains the data they expect. If it does not, then this means the data most likely doesn’t exist on the server or does not match for the query they entered. If it does exist in the response, then it may be a filtering issue in the driver. To troubleshoot this, you can try removing filters (e.g. WHERE clause) and identify if the data is returned. If it is, then this narrows down the issue to the filter. If the filter is pushed down to the driver (e.g. there isn’t an extra layer performing the filter), an issue can be raised with the CData team to investigate further.

Try to reproduce the issue in a local environment¶

In many cases, it is helpful to try to reproduce a customer’s issue in your local testing environment. This helps run various testing procedures and narrow down the issue. If the issue can be reproduced, the appropriate action can be taken to get it resolved. If the issue cannot be reproduced, then the customer’s behavior may be a result of a configuration issue or a data specific problem. In these cases, it is helpful to compare log files from your local test environment to the customer’s. Connection properties can be compared to ensure that the correct settings are specified. Additionally, the request/response sequence can be compared to identify if there are any differences there. For issues that seem to be data specific, it is helpful to try to import their data set or configure the same data (where applicable) and run tests locally to further troubleshoot and narrow down the issue. Take notice when the log file shows that the customer has set the location property in the connection string. This may indicate that the customer is using a custom schema file (.RSD), and it is often important to retrieve this schema file from them in order to reproduce the issue locally.

SSL Troubleshooting Tips¶

SSL Handshake Errors¶

Example: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException:
Server not trusted by the user

This particular error occurs when the certificate presented by the server is not trusted by the user/machine. The driver will attempt to validate the server certificate using the system’s trust store. The trust store will be either the javax.net.ssl.trustStore value (if specified) for the system or will map to java’s installed cacerts. If a value is specified for the SSL Server Cert property, this value will override the system trust store and will attempt to validate the certificate for the specified input.

The format of the input must be in one of the forms specified in the CData help documentation.

SSL Client Properties¶

These properties are specific for client authentication, they are not for validating a normal SSL connection for encrypted communication. These properties are specific for 2-way SSL where the connection is encrypted but the server requires the client to authenticate. This particular logic is configured on the server and in such cases the server will request that the client present their certificate to validate their identity and authenticate with the server.

In most cases, these properties should never be set. If a customer has set these, it is best to verify with them that the server requires client authentication via PKI (public key infrastructure) and if not to just remove them.

Amazon DynamoDB OCF Connector: Install and Configure¶

Network Connectivity¶

Service Account¶

Permissions for Metadata Extraction¶

Permissions for Sampling and Profiling¶

Authentication¶

Basic Authentication¶

AWS Authentication¶

SSL Authentication¶

SSO Authentication¶

JDBC URI¶

Format:¶

Install the Connector¶

Alation On-Premise¶

Alation Cloud Service¶

Connection via Alation Agent¶

Connection Without Agent¶

Create and Configure a New Data Source¶

Configuration¶

Access¶

General Settings¶

Connector Settings¶

Data Source Connection¶

Connection¶

AWS Authentication¶

SSO¶

SSL¶

Firewall¶

Proxy¶

Logging¶

Schema¶

Misc¶

Obfuscate Literals¶

Test Connection¶

Logging Configuration¶

Delete the Data Source¶

Metadata Extraction¶

Compose¶

Sampling and Profiling¶

Troubleshooting¶

General Troubleshooting Tips¶

Verify the Driver Version Being Used¶

Identify Whether the Issue Stems from Product or Application or the Driver¶

Identify Whether the Issue Stems from the Driver or the Data Source¶

Try to reproduce the issue in a local environment¶

SSL Troubleshooting Tips¶

SSL Handshake Errors¶

SSL Client Properties¶

Alation User Documentation PDFs