MongoDB onboarding and profiling example video
You can onboard MongoDB data by connecting it to a Data Catalog data source, then triggering a Data Catalog profiling job. The following example is a video of how to set up a MongoDB data source and profile it in Data Catalog:
Steps used in MongoDB onboarding example video
You can use these instructions to follow along in the MongoDB example video.
Adding a data source
If your role has the Manage Data Sources privilege, perform the following steps to create data source definitions.
Specify MongoDB data source identifiers
Click Management in the left toolbar of the navigation pane.The Manage Your Environment page opens.
Click Data Source then Add Data Source, or Add New then Add Data Source.The Create Data Source page opens.
Specify the following basic information for the connection to your data source:
Field Description Data Source Name Specify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.NoteNames must start with a letter and must contain only letters, digits, and underscores. White spaces in names are not supported. Description (Optional) Specify a description of your data source. Agent Select the Data Catalog agent that will service your data source. This agent is responsible for triggering and managing profiling jobs in Data Catalog for this data source. Data Source Type Select the database type of your source. You are then prompted to specify additional connection information based on the file system or database type you are trying to access.
Specify the following additional connection information based on the MongoDB resource you are trying to access:
Field Description Configuration Method Select URI as the configuration method. Source Path Enter the MongoDB database path. For example, the default database path for MongoDB is
URL Enter the MongoDB server URL, for example,
Username and password Enter your username and password to connect to the MongoDB server.
Test and add your data source
Click Test Connection to test your connection to the specified data source.If you are testing a MySQL connector and you get the following error, it means you need a more recent MySQL connector library:
java.sql.SQLException: Client does not support authentication protocol requested by server. plugin type was = 'caching_sha2_password'
- Go to MySQL :: Download Connector/J and select option Platform Independent.
- Download the compressed (
.zip) file and copy to
/opt/ldc/agentis your agent install directory, and unpack the file.
(Optional) Enter a Note for any information you need to share with others who might access this data source.
Click Create Data Source to establish your data source connection.
Job sequences are sequences of jobs in Lumada Data Catalog that can be executed by users who have job execution privileges.
Trigger a sequence job
Click Data Canvas in the left navigation menu.The Explore Your Data page opens.
Use the Navigation pane to drill down to the resource.
Click More actions and then select Process from the menu that displays.The Process Selected Items page opens.
Click the sequence that you want to use.
The sequence page opens.
Sequence Description Select Template A template is a custom definition for a given process with a custom set of parameters. Format Discovery Identifies the format of data resources, marking the resources that can be further processed. Schema Discovery Applies format-specific algorithms to determine the structure of the data in each resource, producing a list of columns or fields for each resource’s catalog entry. Collection Discovery Discover collections of data elements with same schema. Data Profiling Profiling applies data-specific logic to compute field-level statistics and patterns for each resource as unique fingerprints of the data. Data Profiling Combo Starts a combined sequence of processes to profile your data. Executes format discovery, schema discovery and data profile process. Business Term Discovery Compares and analyzes the computed fingerprints with any defined or seeded label signatures to discover possible matches.
Note that users must have Run Term Discovery permissions to run this job.
Lineage Discovery Shows relationships among resources in the form of a lineage graph. Data lineage identifies copies of the same data, merges between resources, and the horizontal and vertical subsets of these resources. Data Rationalization Finds redundant data copies and overlaps.
Based on the resource, follow the workflow for the sequence.
Click Incremental Profiling if you want to use incremental processing.NoteWhen you select Fast profiling mode in the Sequence flow, the default values for sample-splits and sample-rows are used as defined in the Agent component's configuration.
In the Enter Parameters field, enter any command line parameters for the sequence.
Click Start Now.