To customize Lumada Data Catalog for your environment, you can change specific configuration settings that are used by scripts when gathering catalog metadata, business terms, and term associations, and other Data Catalog functions.
The available configuration categories are shown in the following example:
For the Lumada Data Catalog Application Server, you can update configuration settings for both the Security and Tools configuration groups. The following table shows an example of available configuration settings in the configuration group.
|Security||Set the supported characters for validating job template arguments, for preventing an injection attack|
|Tools||Whether to enable or disable the lineage import-export feature|
For the local Lumada Data Catalog Agent, you can update configuration settings for discovery, the discovery profiler, miscellaneous settings, and the metadata service configuration groups. The following table shows an example of available configuration settings in a configuration group.
|Discovery||Whether or not to enable Hive support when initializing a Spark session|
|Discovery_Profiler||Set characters in a file name prefix that should be skipped during discovery|
|Misc||Set the URI for the discovery cache metadata files|
|MetadataService||Set the Hive version|
You can search the Configurations page for a specific setting that you want to update. If you do not know the configuration setting you want to modify, you can enter a keyword in the search box as in the image below. Results return a list of the configuration groups with settings that match your search term. In this example, there is only one configuration group with the prefix keyword, Discovery Profiler. Click the View Details (>) icon to view the list of matching settings.
Changing configuration settings
If you want to change the file name prefixes that are ignored during discovery, you would modify the LDC Selector Ignore Prefixes setting in the Discovery Profiler configuration group, as shown in the following image.
Reset to the last saved value.
Set to default
Set the value back to the default value.
Save the value specified in the text box.
If you change the value of the setting, make sure you save the change. For some LDC Agent configuration settings, you need to restart the agent after changing the setting. If the Value text box for a configuration setting displays with an asterisk (
*), then an Agent restart is required.For more information, see Restart an agent.
Restart an agent
After you change some configuration settings, you need to restart the LDC Agent to use the new values. If the Value text box for a property displays with an asterisk (
*), then an Agent restart is required.
- After you have updated your settings for the Agent, return to the Configurations page.
- Click Restart <agent>.
- Click Management, then click Agents to check the status of the LDC Agent.
The Connected column for the LDC Agent displays a green checkmark icon.
Large properties configuration example
In Data Catalog, the large properties location is where the agent stores metadata from profile jobs. You can set up a large properties location in a file system like HDFS, or on object storage like AWS S3. This example illustrates configuring the large properties settings to an AWS S3 bucket.
Navigate to Management, then click Configuration.
Locate the configuration section for the agent where you want the large properties location to be set up. In that section, click the View Details for the
MISCgroup to view all miscellaneous settings for that agent.
Expand the Attributes for discovery cache metadata store setting and provide the credentials and the endpoint for your AWS S3 bucket:
fs.s3a.access.key=<AWS S3 access key> fs.s3a.secret.key=<AWS S3 secret key> fs.s3a.endpoint=<AWS S3 endpoint> fs.s3a.path.style.access=true fs.s3a.threads.max=40 fs.s3a.connection.maximum=200
Click Save Change.
Open the Relative location for a large properties metadata store setting. In the Value text box, provide a folder location in your S3 bucket. For example.
Click Save Change.
Open the URI for discovery cache metadata store setting and provide the URI for the S3 bucket. For an AWS S3 URI, this setting is in one of two formats, depending on the agent for which you are configuring these settings:
- For a remote agent running on EMR: s3://<Bucket Name>
- Any other agent: s3a://<Bucket Name>
Click Save Change.
Navigate to the Configurations page and click the restart link of the applicable agent.