Skip to main content

Pentaho+ documentation is moving!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Remote agent

Parent article

An agent runs jobs against your data sources, a process that requires high bandwidth and low latency between the agent and the data sources. As it is not always possible to have data sources located near your cluster where the local agent resides, a remote agent can be set up closer to your data source(s).

The script that creates a remote agent is included in the Lumada Data Catalog artifacts found on the Hitachi Vantara Lumada and Pentaho Support Portal.

To set up a remote agent, first review the Data Catalog System requirements for the remote agent installation, distributions, and Kerberos environments requirements, then use the following installation instructions.

Install a remote agent in a non-Kerberos environment

You can create a remote agent using a run file that is included with the Data Catalog artifacts. The run file guides you through your remote agent set up process, which varies according to whether or not your environment has Kerberos enabled.

Perform the following steps to create a remote agent in a non-Kerberos environment:

Procedure

  1. If you have not already, download the Data Catalog artifacts from the Hitachi Vantara Lumada and Pentaho Support Portal.

    The script for creating a remote agent is ldc-agent-<version>.run, where <version> is the version of Data Catalog you have.
  2. Place the ldc-agent-<version>.run file into your environment and execute the file:

    1. sudo sh ldc-agent-<version>.run

    2. Respond to the script prompts according to your environment.

    The following example shows sample script output, in which the remote agent ldc_example is created with the following parameters:

    • The install location is /opt/ldc_example.
    • The remote agent will be managed by the service user ldcuser.
    • The remote agent will connect to a Data Catalog cluster that is accessible via NodePort on http://ldc_cluster:31080, where ldc_cluster is the host name and 31080 is the HTTP port.
    NoteThis is an example only, and you should customize your responses to your environment as necessary.
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
                LUMADA DATA CATALOG AGENT INSTALLER  
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    1. Express Install (Requires superuser access)
    2. Custom Install (Runs with non-sudo access)
    3. Upgrade
    4. Exit
    
    Enter your choice [1-4]: 1
    Enter the name of the Lumada Data Catalog service user [ldcuser]: ldcuser
    Enter install location [/opt/ldc]: /opt/ldc_example
    Enter log location [/var/log/ldc]: /opt/ldc_example/logs
    Enter Appserver endpoint [http://localhost:3000]: http://ldc_cluster:31080
    Enter the name of the agent: ldc_example
    Enter HIVE version [3.1.2]: 3.1.2
    Is Kerberos enabled? [y/N]: N
    ~~~~~~~~~~~~~~~~~~~~~~~
    SELECTION SUMMARY
    ~~~~~~~~~~~~~~~~~~~~~~~
    Lumada Data Catalog service user : ldcuser
    Install location : /opt/ldc_example/ldc (will be created)
    Log location : /opt/ ldc_example /logs/ldc (will be created)
    Kerberos enabled : false AppServer endpoint : http://ldc_cluster:31080
    Agent ID : ldc_example
    Proceed? [Y/n]: Y

    The script will then create the install and log locations. You can find the remote agent configuration in the install location you specified, under the ldc/agent folder.

  3. Switch to the Data Catalog service user that was specified in the remote agent setup, and navigate to your remote agent’s configuration:

    sudo su – ldcuser
    cd /opt/ldc_example/ldc/agent
  4. When you are in the ldc/agent directory, start the agent:

    bin/agent start

Install a remote agent in a Kerberos environment

You can create a remote agent using a run file that is included with the Data Catalog artifacts. The run file guides you through your remote agent set up process, which varies according to whether or not your environment has Kerberos enabled.

For setup on a Kerberos environment, a path to an existing keytab file on the server and the service principal are required by the script.

Perform the following steps to create a remote agent in a Kerberos environment:

Procedure

  1. If you have not already, download the Data Catalog artifacts from the Hitachi Vantara Lumada and Pentaho Support Portal.

    The script for creating a remote agent is ldc-agent-<version>.run, where <version> is the version of Data Catalog you have.
  2. Place the ldc-agent-<version>.run file into your environment and execute the file:

    1. sudo sh ldc-agent-<version>.run

    2. Respond to the script prompts according to your environment.

    The following example shows sample script output, in which the remote agent ldc_example is created with the following parameters:

    • The install location is /opt/ldc_example.
    • The remote agent will be managed by the service user ldcuser.
    • The remote agent will connect to a Data Catalog cluster that is accessible via NodePort on https://ldc_cluster:31083, where ldc_cluster is the host name and 31083 is the HTTPS port.
    NoteThis is an example only, and you should customize your responses to your environment as necessary.
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
                LUMADA DATA CATALOG AGENT INSTALLER  
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
    1. Express Install (Requires superuser access)
    2. Custom Install (Runs with non-sudo access)
    3. Upgrade
    4. Exit
    
    Enter your choice [1-4]: 1
    Enter the name of the Lumada Data Catalog service user [ldcuser]: ldcuser
    Enter install location [/opt/ldc]: /opt/ldc_example
    Enter log location [/var/log/ldc]: /opt/ldc_example/logs
    Enter Appserver endpoint [http://localhost:3000]: https://ldc_cluster_url:31083
    Enter the name of the agent: ldc_example
    Enter HIVE version [3.1.2]: 3.1.2
    Is Kerberos enabled? [y/N]: Y
    Full path to Lumada Data Catalog service user keytab: /home/ldcuser/ldcuser.keytab
    Lumada Data Catalog service user’s fully qualified principal: ldcuser@<your company>.com
    ~~~~~~~~~~~~~~~~~~~~~~~
    SELECTION SUMMARY
    ~~~~~~~~~~~~~~~~~~~~~~~
    Lumada Data Catalog service user : ldcuser
    Install location : /opt/ldc_example/ldc (will be created)
    Log location : /opt/ldc_example/logs/ldc (will be created)
    Kerberos enabled : true
    Kerberos keytab path : /home/ldcuser/ldcuser.keytab
    Kerberos principal : ldcuser@<your company>.com AppServer endpoint : https://ldc_cluster:31083
    Agent ID : ldc_example
    Proceed? [Y/n]: Y

    The script will then create the install and log locations. You can find the remote agent configuration in the install location you specified, under the ldc/agent folder.

  3. Switch to the Data Catalog service user that was specified in the remote agent setup, and navigate to your remote agent’s configuration:

    sudo su – ldcuser
    cd /opt/ldc_example/ldc/agent
  4. Use the keytab file that was created with the rest of your agent-related files during agent setup to obtain a Kerberos ticket for the service user's principal:

    cd /opt/ldc_example/ldc/agent
    kinit -kt keytab/ldcuser.keytab ldcuser@<your company>.com
  5. Using the openssl command, fetch certificate fingerprints, passing the hostname (and if applicable, the port or path) of the Data Catalog cluster. In this example, the certificate fingerprints are retrieved from ldc_cluster, where the application is accessible on port 31083:

    openssl s_client -connect ldc_cluster:31083 < /dev/null 2>/dev/null | openssl x509 -fingerprint -sha256 -noout -in /dev/stdin | cut -d'=' -f2 | tr -d : | tr [:upper:] [:lower:]
    If executed successfully, the command will return certificate fingerprints in the form of an alphanumeric string.
  6. Use the certificate fingerprints to register the remote agent to the Data Catalog cluster:

    bin/agent register --agent-token null --endpoint wss://ldc_cluster:31083/wsagent --agent-id ldc_example --cert-fingerprint <certificate fingerprints from openssl command>
    This step starts the remote agent.

Authorize a remote agent

Perform the following steps to authorize a remote agent:

Procedure

  1. Once the remote agent has been started/registered, check the remote agent logs and confirm that the agent is running.

    cd /opt/ldc_example/ldc/agent
    bin/agent log -f

    If set up correctly, the logs will include an agent authorization error that looks something like the following:

    [WebSocketClient-SecureIO-1] INFO com.hitachivantara.datacatalog.remoteagent.socket_services.WebsocketSocketService - Disconnected: CloseReason: code [3403], reason [agent not authorized]
  2. To authorize the agent, open your browser and log into the Data Catalog user interface, then navigate to Management and click Agents.

    The configured remote agent should now appear in the list of available agents.
  3. Click the Authorize button next to the remote agent.

  4. Go back to your remote agent’s logs, where there should be logs confirming that the remote agent has been authorized. This includes a successful handshake between the agent and Data Catalog cluster, creation of configuration data and then a series of successful “pings” to the agent:

    31 May 2022 14:08:43.784 [pool-10-thread-1] INFO com.hitachivantara.datacatalog.remoteagent.messagehandlers.TokenHandler - Processed handshake from server registered agent: ldc_example
    31 May 2022 14:08:45.750 [pool-10-thread-2] INFO com.hitachivantara.datacatalog.remoteagent.messagehandlers.MetaHealthHandler - Agent: Hi there from mother ship, ping:1654006125726
    […]
    31 May 2022 14:09:42.678 [pool-10-thread-5] INFO com.hitachivantara.datacatalog.remoteagent.messagehandlers.MetaHealthHandler - Agent: ping to ldc_example:1654006182675

Configure Data Catalog for CDP

If you are using the Cloudera Data Platform (CDP), you need to run some configuration steps after you install Data Catalog.

Use the following steps to configure Data Catalog for CDP:

NoteThese steps are an example only. In the steps below, modify CDH-7.1.4-1.cdh7.1.4.p2.6981144 according to your CDP version.

Procedure

  1. Use a command similar to the example shown below to check for the latest version kotlin JAR file available in the system:

    ls -ltr /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin*

    The output of the command is listed in descending order. The latest JAR file version will appear first in the list.

    Sample output:

    -rw-r--r--. 1 root root  170934 Nov 10  2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-common-1.3.50.jar 
    
    -rw-r--r--. 1 root root 1326269 Nov 10  2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-1.3.50.jar 
    
    -rw-r--r--. 1 root root 1290549 Nov 10  2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-1.3.40.jar 
    
    -rw-r--r--. 1 root root  162009 Nov 10  2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-common-1.3.40.jar 
    
    -rw-r--r--. 1 root root  111800 Nov 10  2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-common-1.2.71.jar 
    
    -rw-r--r--. 1 root root  962149 Nov 10  2020 /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/jars/kotlin-stdlib-1.2.71.jar

    In the output above, the latest version of the kotlin JAR file is 1.3.50, so we must download the kotlin-reflect-1.3.50 JAR file.

  2. Use a command similar to the example shown below to download the JAR file you need. This example shows the command for the kotlin-reflect-1.3.50 JAR file (modify the Maven JAR file version based on the output of the command in Step 1):

    wget -P /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/lib/hadoop/lib https://repo1.maven.org/maven2/org/jetbrains/kotlin/kotlin-reflect/1.3.50/kotlin-reflect-1.3.50.jar
    NoteRun this command on the master node as well as on all the worker nodes.
  3. Create a hive_warehouse_connector directory under agent installation (the default location is /opt/ldc/agent if the path is not customized).

  4. Use a command similar to the example shown below to copy the Hive warehouse connector JAR file from CDH into the hive_warehouse_connector directory.

    cp /opt/cloudera/parcels/CDH-7.1.4-1.cdh7.1.4.p2.6981144/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.4.2-1.jar hive_warehouse_connector
  5. Enable the Hive warehouse connector (SPARK_HIVE_CONNECTOR_JAR_DIR) JAR file in the ldc script to add it to the Spark classpath.

Results

Data Catalog is configured for CDP.