MongoDB OCF Connector: Install and Configure¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
Network Connectivity¶
Open outbound TCP port 27017 to the MongoDB server.
Service Account¶
For more information about built-in roles and permissions, refer to the official MongoDB documentation about Built-In Roles and Create Users.
Authentication¶
Basic Authentication¶
You’ll need the following credentials for basic authentication:
Username of the service account
Password of the service account
Kerberos Authentication¶
Select GSSAPI in the Auth Scheme on the General Settings page to enable this.
You’ll need the following credentials and resources for kerberos authentication:
Username of the service account
Password of the service account
The Kerberos Key Distribution Center (KDC) service
The Kerberos realm
The Kerberos service realm
The service principal name (SPN) for Kerberos Domain Controller
The Kerberos service KDC
LDAP Authentication¶
Select PLAIN in the Auth Scheme on the General Settings page to enable this.
You’ll need the following credentials and resources for authentication:
Username of the service account
Password of the service account
SSL Authentication¶
Available from the connector version 23.0.8867 or later.
Select Use SSL in the Auth Scheme on the General Settings page to enable this.
You’ll need the following credentials and resources for authentication:
The SSL client certificate file
The SSL client certificate file password
(Optional) The SSL server certificate file
Important
SSL server certificate isn’t supported. If you use server certificate for connection, contact Alation Support.
JDBC URI¶
Important
We recommend that you provide the values in the corresponding fields on the General Settings page instead of the JDBC URI field in the Datasource Connection section. Leave this field empty if all the connection properties you need are available in the user interface.
Install the Connector¶
Alation On-Premise¶
Important
Installation of OCF connectors requires Alation Connector Manager to be installed as a prerequisite.
To install an OCF connector:
If this has not been done on your instance, install the Alation Connector Manager: Install Alation Connector Manager.
Ensure that the OCF connector Zip file is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
Alation Cloud Service¶
Note
On Alation Cloud Service instances, Alation Connector Manager is available by default.
Depending on your network configuration, you may need to use Alation Agent to connect to databases.
Connection via Alation Agent¶
Ensure that Alation Agent is enabled on your Alation instance. If necessary, create a Support ticket with Alation for an Alation representative to enable the Alation Agent feature on your instance and to receive the Alation Agent installer.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
Connection Without Agent¶
To install an OCF connector:
Ensure that the OCF connector Zip file is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
Create and Configure a New Data Source¶
In Alation, add a new data source:
Log in to Alation as a Server Admin.
Expand the Apps menu on the right of the main toolbar and select Sources.
On the Sources page, click +Add on the top right of the page and in the list that opens, click Data Source. This will open the Add a Data Source wizard.
On the first screen of the wizard, specify a name for your data source, assign additional Data Source Admins, if necessary, and click the Continue Setup button on the bottom. The Add a Data Source screen will open.
On the Add a Data Source screen, the only field you should populate is Database Type. From the Database Type dropdown, select the connector name. After that you will be navigated to the Settings page of your new data source.
Note
Agent-based connectors will have the Agent name appended to the connector name.
The name of this connector is Alation OCF Connector for MongoDB.
Configuration¶
Access¶
On the Access tab, set the data source visibility using these options:
Public Data Source—The data source will be visible to all users of the catalog.
Private Data Source—The data source will be visible to the users allowed access to the data source by Data Source Admins.
You can add new Data Source Admin users in the Data Source Admins section.
General Settings¶
Note
The automatic lineage is not supported in this release.
Specify Application Settings if applicable. Click Save to save the changes after providing the information.
Parameter |
Description |
---|---|
BI Connection Info |
This parameter is used to generate lineage between the current data source and another source in the catalog, for example a BI source that retrieves data from the underlying database. The parameter accepts host and port information of the corresponding BI data source connection. Use the following format: You can provide multiple values as a comma-separated list:
Find more details in BI Connection Info |
Disable Automatic Lineage Generation |
Select this checkbox to disable automatic lineage generation from QLI, MDE, and Compose queries. By default, automatic lineage generation is enabled. |
Connector Settings¶
Populate the data source connection information and Save the values.
Data Source Connection¶
Parameter |
Description |
---|---|
JDBC URI |
Specify the JDBC URI in the required format. |
Important
We recommend that you provide the values in the corresponding fields on the General Settings page instead of the JDBC URI field in the Datasource Connection section. Leave this field empty if all the connection properties you need are available in the user interface.
Authentication¶
Parameter |
Description |
---|---|
Auth Scheme |
Specify an authentication scheme that MongoDB uses to authenticate the connection. Available Options:
|
Server |
Specify the server that hosts the MongoDB database. If you choose to connect using DNS seed lists, use format: <”mongodb+srv://” + the name of the server running the MongoDB instance>. |
Port |
Specify the MongoDB database port number. |
User |
Specify the username of the service account. |
Password |
Specify the password of the service account. |
Database |
Specify the name of the MongoDB database. |
Use SSL |
Select this to enable SSL connection. This is used to notify the OCF MongoDB driver to use an SSL handshake to complete the authentication process. By default, MongoDB Compass and MongoDB Atlas instances do not need SSL to be enabled. This is required only when a cluster on MongoDB is using the SSL. |
Replica Set |
Specify a list of primary servers that the provider queries from in addition to the specified server and port. Use the format: <Server=localhost;Port=27017;ReplicaSet=localhost:27018,localhost:27019;..> |
DNS Server |
Specify the DNS server for a MongoDB database using a DNS-constructed seed list. For example, specify 8.8.8.8 for Google DNS server. |
Kerberos¶
Parameter |
Description |
---|---|
Kerberos KDC |
Specify the Kerberos Key Distribution Center (KDC) service name. You can locate it with the domain controller. If unspecified, the KRB5_CONFIG environment variable uses the default MIT location to fetch the KRB5 config file to get the Kerberos properties.
|
Kerberos Realm |
Specify the Kerberos realm as defined by the administrator or domain name. If unspecified, the KRB5_CONFIG environment variable uses the default MIT location to fetch the KRB5 config file to get the Kerberos properties.
|
Kerberos SPN |
Specify the service principal name (SPN) for the Kerberos Domain Controller. Set this if the URL that you are authenticating to is different. |
Kerberos Keytab File |
Specify the Keytab file name that contains the pairs of Kerberos principals and encrypted keys. |
Kerberos Service Realm |
Specify the service that Kerberos realm uses if you’re using cross-realm Kerberos authentication. |
Kerberos Service KDC |
Specify the service that Kerberos KDC if you’re using cross-realm Kerberos authentication. |
Kerberos Ticket Cache |
Specify the file path to an MIT Kerberos credential cache file. |
SSL¶
Available from the connector version 23.0.8867 or later.
Parameter |
Description |
---|---|
SSL Client Cert |
Specify the certificate store for the client certificate and select the appropriate file type from the options under SSL Client Cert Type. |
SSL Client Cert Type |
Specify the type of key store that contains the SSL client certificate. Available Options:
|
SSL Client Cert Password |
Specify the password for the client certificate. If the certificate store is of a type that requires a password, this property is used to specify that password to open the certificate store. |
SSL Client Cert Subject |
Specify the subject of the client certificate. The subject is a comma separated list of distinguished name fields and values. Consider the following points:
|
SSL Server Cert |
Specify the TLS/SSL certificate to be accepted from the server. Accepted Values:
Certificates are validated as trusted by the machine based on the system’s trust store. The trust store used is the
|
SSH¶
Parameter |
Description |
---|---|
Use SSH |
Select this to enable SSH authentication. |
SSH Auth Mode |
Specify the authentication method to establish an SSH Tunnel to the service. Available Options:
|
SSH Client Cert |
Specify the client certificate that contains a valid private key. If a public key is unavailable for authentication, it is autogenerated from the private key. The SSH Client Cert Type field specifies the type of the key store specified by SSH Client Cert. If the store is password protected, specify the password in SSH Client Cert Password. Some types of key stores are containers which may include multiple keys. By default the provider will select the first key in the store, but you can specify a specific key using SSH Client Cert Subject. |
SSH Client Cert Password |
Specify the password of the SSH Client Cert. Required only when authenticating to SFTP servers with SSH Auth Mode set to Public Key and SSH Client Cert set to a private key. |
SSH Client Cert Subject |
Specify the subject of the SSH client certificate to locate the certificate in the store in comma-separated values. For example, CN=www.server.com, OU=test, C=US, E=support@alation.com. If you specify ‘*’, the first certificate in the certificate store is picked. |
SSH Client Cert Type |
Specify the type of SSH Client Cert private key. Default: PEMKEY_FILE Available Options:
|
SSH Server |
Specify the SSH Server. |
SSH Port |
Specify the SSH Port. |
SSH User |
Specify the SSH username. |
SSH Password |
Specify the SSH password for the user. |
SSH Server Fingerprint |
Specify the SSH server fingerprint. |
Firewall¶
Parameter |
Description |
---|---|
Firewall Type |
Specify the protocol used by the proxy-based firewall for traffic tunneling. Available Options:
|
Firewall Server |
Specify the host name, DNS name, or IP address of the proxy-based firewall. |
Firewall Port |
Specify the TCP port of the proxy-based firewall. |
Firewall User |
Specify the user name to authenticate with the proxy-based firewall. |
Firewall Password |
Specify the password to authenticate with the proxy-based firewall. |
Logging¶
Parameter |
Description |
---|---|
Verbosity |
Specify the verbosity level between 1 to 5 to include details in the log file. |
Log Modules |
Includes the core modules in the log files. Add module names separated by a semi-colon. By default, all modules are included. |
Max Log File Count |
Specify the maximum file count for log files. After the limit, the log file is rolled over and time is appended at the end of the file. The oldest log file is deleted. Maximum Value: 2 Default: -1. A negative or zero value indicates unlimited files. |
Schema¶
Parameter |
Description |
---|---|
Browsable Tables |
Specify the schemas as subset of the available schemas in a comma separated list. For example, BrowsableSchemas=SchemaA,SchemaB,SchemaC |
Tables |
Specify the fully qualified name of the table as a subset of the available tables in a comma separated list. For example, Tables=TableA,TableB,TableC Each table must be a valid SQL identifier that might contain special characters escaped using square brackets, double-quotes, or backticks. For example, Tables=TableA,[TableB/WithSlash],WithCatalog.WithSchema.`TableC With Space`. |
Views |
Specify the fully qualified name of the Views as a subset of the available tables in a comma separated list. For example, Views=ViewA,ViewB,ViewC. Each table must be a valid SQL identifier that might contain special characters escaped using square brackets, double-quotes, or backticks. For example, Views=ViewA,[ViewB/WithSlash],WithCatalog.WithSchema.`ViewC With Space`. |
Misc¶
Parameter |
Description |
---|---|
Batch Size |
Specify the maximum size of each batch operation. Default: 0 |
Built In Column Mapping |
Specify a list of MongoDB built-in column names separated by comma to map them to new names. For example, _index=BuiltInIndex,P_id=Parent_Id |
Connection Life Time |
Specify the maximum limit for a connection to stay connected in seconds. Default: 0 indicates unlimited lifetime for a connection. |
Data Model |
Specify the data model to RELATIONAL to discover the metadata for a child table that extends to the root level. Default: DOCUMENT. |
Flatten Arrays |
Specify an arbitrary number to flatten the elements in a nested array into columns. By default, the nested arrays are returned as JSON strings. Set it to “-1” to flatten all the elements. |
Flatten Objects |
Select this to flatten the object properties in a nested array into columns. By default, the nested arrays are returned as JSON strings. |
Generate Schema Files |
Specify the preference when to generate and save the schemas. Available Options:
|
Max Rows |
Specify the limit for the number of rows returned if no aggregation or GROUP BY is used in the query. This takes precedence over LIMIT clauses. |
No Cursor Timeout |
Select this to prevent the timeout for an idle cursor that is inactive for more than 10 minutes. |
Other |
Specify the caching, integration, or formatting properties in a list format separated by a semicolon. Available Options:
|
Pagesize |
Specify the maximum number of results to return per page from MongoDB. A higher value results in better performance but uses more memory. |
Pool Idle Timeout |
Specify the idle time for a connection in a pool. Default: 60 seconds |
Pool Max Size |
Specify the maximum number for connections in a pool. To disable, set the value to 0 or less. Default: 100 |
Pool Min Size |
Specify the minimum number for connections in a pool. Default: 1 |
Pool Wait Time |
Specify the maximum wait duration for a connection to become available. If a new connection request is in wait for an available connection but exceeds the time, an error is thrown. By default, new connection requests have a forever wait time for an available connection. Default: 60 seconds |
Pseudo Columns |
Specify the pseudo columns in the comma-separated list to be added as columns to the table. For example, “Table1=Column1, Table1=Column2, Table2=Column3”. Use the ‘*’ character to include all tables and columns in this format: “=” |
Query Passthrough |
Select this to pass a query to MongoDB without any change. |
Readonly |
Select this to enforce only SELECT queries to work on MongoDB. |
Read Preference |
Specify to execute queries to a member in a replica set other than the primary member. Available Options:
|
Read Preference Tags |
Specify this to target a replica set member or members that are associated with tags. Use this when Read Preference is set to a value other than the default value. Available Options:
|
Row Scan Depth |
Specify the maximum number of rows to scan for the available columns in a table. Set it to “-1” to scan an arbitrary number of rows. |
Slave OK |
Select this to read from the secondary (slave) servers in a replica set. |
Timeout |
Specify the time limit in seconds after which the operation is canceled and an error is thrown. A value of 0 specifies that the operation never times out until completion or failure. Default: 60 seconds |
Type Detection Scheme |
Specify how to scan data to determine the fields and datatypes in a document collection. Available Options:
|
Update Scheme |
Specify how to modify the target document with the original when you use UPDATE query. Available Options:
|
Use Connection Pooling |
Select this to enable connection pooling. |
Use Find API |
Select this to execute MongoDB queries using db.collection.find(). |
User Defined Views |
Specify the file path pointing to the JSON configuration file that contains custom views. |
Write Concern |
Specify the level of acknowledgment requests that denote the write operation has propagated to the specified number of MongoDB instances. For more information, see the MongoDB documentation about Write Concern. Default: 0 |
Write Concern Journaled |
Select this to make Write Concern acknowledgement requests mandatory for MongoDB instances written on to the on-disk journal. |
Write Concern Timeout |
Specify a time limit in milliseconds for Write Concern. Default: 0 |
Write Scheme |
Specify how to determine object type for the inserted or updated objects. Available Options:
|
Obfuscate Literals¶
Obfuscate Literals—Enable this toggle to hide actual values in the query statements that are ingested during query log ingestion or executed in Compose. This toggle is disabled by default.
Test Connection¶
Under Test Connection, click Test to validate network connectivity.
If the connection test fails, make sure the JDBC URI and service account credentials are correct.
Logging Configuration¶
Select the logging level for the connector logs and save the values by clicking Save in this section. The available log levels are based on the Log4j framework.
Parameter |
Description |
---|---|
Log level |
Select the log level to generate logs. The available options are INFO, DEBUG, WARN, TRACE, ERROR, FATAL, ALL. |
Metadata Extraction¶
You can configure metadata extraction (MDE) for an OCF data source on the Metadata Extraction tab of the Settings page. Refer to Configure Metadata Extraction for OCF Data Sources for information about the available configuration options.
This connector supports default query-based MDE. Custom query-based extraction is not supported.
Compose¶
MongoDB OCF Connector supports the query service to perform the following tasks:
Create a connection
Perform SQL queries on the connection
Execute single and multi line queries
View error messages
Download Microsoft Excel live report
Enable Query Service¶
By default, the Query Service feature is disabled. Enable it by setting the alation.connector.query_service.enabled
feature flag.
alation_conf alation.connector.query_service.enabled -s True
alation_action restart_alation
For details about configuring the Compose tab of the Settings page, refer to Configure Compose for OCF Data Sources.
Sampling and Profiling¶
Sampling and profiling is supported. For details, see Configure Sampling and Profiling for OCF Data Sources.
Troubleshooting¶
Refer to Troubleshooting for information about logs.