Remote agent
An agent runs jobs against your data sources, a process that requires high bandwidth and low latency between the agent and the data sources. As it is not always possible to have data sources located near your cluster where the local agent resides, a remote agent can be set up closer to your data source(s).
The script that creates a remote agent is included in the Lumada Data Catalog artifacts found on the Hitachi Vantara Lumada and Pentaho Support Portal.
To set up a remote agent, first review the Data Catalog System requirements for the remote agent installation, distributions, and Kerberos environments requirements, then use the following installation instructions.
Install a remote agent in a non-Kerberos environment
You can create a remote agent using a run file that is included with the Data Catalog artifacts. The run file guides you through your remote agent set up process, which varies according to whether or not your environment has Kerberos enabled.
Perform the following steps to create a remote agent in a non-Kerberos environment:
Procedure
If you have not already, download the Data Catalog artifacts from the Hitachi Vantara Lumada and Pentaho Support Portal.
The script for creating a remote agent isldc-agent-<version>.run
, where <version> is the version of Data Catalog you have.Place the
ldc-agent-<version>.run
file into your environment and execute the file:sudo sh ldc-agent-<version>.run
Respond to the script prompts according to your environment.
The following example shows sample script output, in which the remote agent
ldc_example
is created with the following parameters:- The install location is /opt/ldc_example.
- The remote agent will be managed by the service user
ldcuser
. - The remote agent will connect to a Data Catalog cluster that is accessible via NodePort on http://ldc_cluster:31080, where ldc_cluster is the host name and 31080 is the HTTP port.
NoteThis is an example only, and you should customize your responses to your environment as necessary.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LUMADA DATA CATALOG AGENT INSTALLER ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Express Install (Requires superuser access) 2. Custom Install (Runs with non-sudo access) 3. Upgrade 4. Exit Enter your choice [1-4]: 1 Enter the name of the Lumada Data Catalog service user [ldcuser]: ldcuser Enter install location [/opt/ldc]: /opt/ldc_example Enter log location [/var/log/ldc]: /opt/ldc_example/logs Enter Appserver endpoint [http://localhost:3000]: http://ldc_cluster:31080 Enter the name of the agent: ldc_example Enter HIVE version [3.1.2]: 3.1.2 Is Kerberos enabled? [y/N]: N ~~~~~~~~~~~~~~~~~~~~~~~ SELECTION SUMMARY ~~~~~~~~~~~~~~~~~~~~~~~ Lumada Data Catalog service user : ldcuser Install location : /opt/ldc_example/ldc (will be created) Log location : /opt/ ldc_example /logs/ldc (will be created) Kerberos enabled : false AppServer endpoint : http://ldc_cluster:31080 Agent ID : ldc_example Proceed? [Y/n]: Y
The script will then create the install and log locations. You can find the remote agent configuration in the install location you specified, under the ldc/agent folder.
Switch to the Data Catalog service user that was specified in the remote agent setup, and navigate to your remote agent’s configuration:
sudo su – ldcuser cd /opt/ldc_example/ldc/agent
When you are in the ldc/agent directory, start the agent:
bin/agent start
Install a remote agent in a Kerberos environment
You can create a remote agent using a run file that is included with the Data Catalog artifacts. The run file guides you through your remote agent set up process, which varies according to whether or not your environment has Kerberos enabled.
For setup on a Kerberos environment, a path to an existing keytab file on the server and the service principal are required by the script.
Perform the following steps to create a remote agent in a Kerberos environment:
Procedure
If you have not already, download the Data Catalog artifacts from the Hitachi Vantara Lumada and Pentaho Support Portal.
The script for creating a remote agent isldc-agent-<version>.run
, where <version> is the version of Data Catalog you have.Place the
ldc-agent-<version>.run
file into your environment and execute the file:sudo sh ldc-agent-<version>.run
Respond to the script prompts according to your environment.
The following example shows sample script output, in which the remote agent
ldc_example
is created with the following parameters:- The install location is /opt/ldc_example.
- The remote agent will be managed by the service user
ldcuser
. - The remote agent will connect to a Data Catalog cluster that is accessible via NodePort on https://ldc_cluster:31083, where ldc_cluster is the host name and
31083
is the HTTPS port.
NoteThis is an example only, and you should customize your responses to your environment as necessary.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LUMADA DATA CATALOG AGENT INSTALLER ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. Express Install (Requires superuser access) 2. Custom Install (Runs with non-sudo access) 3. Upgrade 4. Exit Enter your choice [1-4]: 1 Enter the name of the Lumada Data Catalog service user [ldcuser]: ldcuser Enter install location [/opt/ldc]: /opt/ldc_example Enter log location [/var/log/ldc]: /opt/ldc_example/logs Enter Appserver endpoint [http://localhost:3000]: https://ldc_cluster_url:31083 Enter the name of the agent: ldc_example Enter HIVE version [3.1.2]: 3.1.2 Is Kerberos enabled? [y/N]: Y Full path to Lumada Data Catalog service user keytab: /home/ldcuser/ldcuser.keytab Lumada Data Catalog service user’s fully qualified principal: ldcuser@<your company>.com ~~~~~~~~~~~~~~~~~~~~~~~ SELECTION SUMMARY ~~~~~~~~~~~~~~~~~~~~~~~ Lumada Data Catalog service user : ldcuser Install location : /opt/ldc_example/ldc (will be created) Log location : /opt/ldc_example/logs/ldc (will be created) Kerberos enabled : true Kerberos keytab path : /home/ldcuser/ldcuser.keytab Kerberos principal : ldcuser@<your company>.com AppServer endpoint : https://ldc_cluster:31083 Agent ID : ldc_example Proceed? [Y/n]: Y
The script will then create the install and log locations. You can find the remote agent configuration in the install location you specified, under the ldc/agent folder.
Switch to the Data Catalog service user that was specified in the remote agent setup, and navigate to your remote agent’s configuration:
sudo su – ldcuser cd /opt/ldc_example/ldc/agent
Use the
keytab
file that was created with the rest of your agent-related files during agent setup to obtain a Kerberos ticket for the service user's principal:cd /opt/ldc_example/ldc/agent kinit -kt keytab/ldcuser.keytab ldcuser@<your company>.com
Using the openssl command, fetch certificate fingerprints, passing the hostname (and if applicable, the port or path) of the Data Catalog cluster. In this example, the certificate fingerprints are retrieved from ldc_cluster, where the application is accessible on port
31083
:openssl s_client -connect ldc_cluster:31083 < /dev/null 2>/dev/null | openssl x509 -fingerprint -sha256 -noout -in /dev/stdin | cut -d'=' -f2 | tr -d : | tr [:upper:] [:lower:]
If executed successfully, the command will return certificate fingerprints in the form of an alphanumeric string.Use the certificate fingerprints to register the remote agent to the Data Catalog cluster:
bin/agent register --agent-token null --endpoint wss://ldc_cluster:31083/wsagent --agent-id ldc_example --cert-fingerprint <certificate fingerprints from openssl command>
This step starts the remote agent.
Authorize a remote agent
Procedure
Once the remote agent has been started/registered, check the remote agent logs and confirm that the agent is running.
cd /opt/ldc_example/ldc/agent bin/agent log -f
If set up correctly, the logs will include an agent authorization error that looks something like the following:
[WebSocketClient-SecureIO-1] INFO com.hitachivantara.datacatalog.remoteagent.socket_services.WebsocketSocketService - Disconnected: CloseReason: code [3403], reason [agent not authorized]
To authorize the agent, open your browser and log into the Data Catalog user interface, then navigate to Management and click Agents.
The configured remote agent should now appear in the list of available agents.Click the Authorize button next to the remote agent.
Go back to your remote agent’s logs, where there should be logs confirming that the remote agent has been authorized. This includes a successful handshake between the agent and Data Catalog cluster, creation of configuration data and then a series of successful “pings” to the agent:
31 May 2022 14:08:43.784 [pool-10-thread-1] INFO com.hitachivantara.datacatalog.remoteagent.messagehandlers.TokenHandler - Processed handshake from server registered agent: ldc_example 31 May 2022 14:08:45.750 [pool-10-thread-2] INFO com.hitachivantara.datacatalog.remoteagent.messagehandlers.MetaHealthHandler - Agent: Hi there from mother ship, ping:1654006125726 […] 31 May 2022 14:09:42.678 [pool-10-thread-5] INFO com.hitachivantara.datacatalog.remoteagent.messagehandlers.MetaHealthHandler - Agent: ping to ldc_example:1654006182675
Configure Data Catalog for CDP
If you are using the Cloudera Data Platform (CDP), you need to run some configuration steps after you install Data Catalog.
Use the following steps to configure Data Catalog for CDP:
CDH-7.1.4-1.cdh7.1.4.p2.6981144
according to your CDP version.Procedure
Use a command similar to the example shown below to check for the latest version kotlin JAR file available in the system:
ls -ltr /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin*
The output of the command is listed in descending order. The latest JAR file version will appear first in the list.
Sample output:
-rw-r--r--. 1 root root 170934 Nov 10 2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-common-1.3.50.jar -rw-r--r--. 1 root root 1326269 Nov 10 2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-1.3.50.jar -rw-r--r--. 1 root root 1290549 Nov 10 2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-1.3.40.jar -rw-r--r--. 1 root root 162009 Nov 10 2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-common-1.3.40.jar -rw-r--r--. 1 root root 111800 Nov 10 2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-common-1.2.71.jar -rw-r--r--. 1 root root 962149 Nov 10 2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-1.2.71.jar
In the output above, the latest version of the kotlin JAR file is
1.3.50
, so we must download the kotlin-reflect-1.3.50 JAR file.Use a command similar to the example shown below to download the JAR file you need. This example shows the command for the kotlin-reflect-1.3.50 JAR file (modify the Maven JAR file version based on the output of the command in Step 1):
wget -P /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/lib/hadoop/lib https://repo1.maven.org/maven2/org/jetbrains/kotlin/kotlin-reflect/1.3.50/kotlin-reflect-1.3.50.jar
NoteRun this command on the master node as well as on all the worker nodes.Create a hive_warehouse_connector directory under agent installation (the default location is /opt/ldc/agent if the path is not customized).
Use a command similar to the example shown below to copy the Hive warehouse connector JAR file from CDH into the hive_warehouse_connector directory.
cp /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.4.2-1.jar hive_warehouse_connector
Enable the Hive warehouse connector (SPARK_HIVE_CONNECTOR_JAR_DIR) JAR file in the ldc script to add it to the Spark classpath.
Results