Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Remote Agent

Parent article

For remote agents, use the following requirements lists and installation instructions.

Requirements

View the requirements for the remote agent installation, distributions, and Kerberos environments.

General
CategoryDescription
Hardware
  • 8 Cores
  • 64GB RAM
  • 100GB Storage
Miscellaneous
  • Data Catalog has already been set up on a Kubernetes cluster
  • The server hosting your remote agent is able to connect to Data Catalog server.
Distributions

Remote agent set up supports a few distributions that vary in requirements. Follow the requirements of the distribution most suitable for your Data Catalog setup.

CategoryDescription
Amazon Elastic Map Reduce (EMR)
  • EMR Version 6.0.0+
  • Spark 2.4.4+
  • Hive 3.1.2+
  • Hadoop Distribution: Amazon 3.2.1
NoteWhen prompted by the remote agent script, you must set up the remote agent using the Lumada Data Catalog service user hadoop.
Cloudera Data Platform (CDP)CDP version 7.1.3+
Horton Data Platform (HDP)HDP version 3.1.0+
Kerberos Environments

Additionally, you can enable Kerberos on your remote agent’s server. As Kerberos enabled environments add extra security between your remote agent and Data Catalog cluster, some extra configuration is required.

CategoryDescription
Miscellaneous
  • Your Hadoop admin has created a service user on your environment.
  • A keytab file has already been set up for your service user on the Kerberos machine.

Remote agent installation

The binary agent file (RUN file) is included in the Data Catalog artifacts found on the Hitachi Vantara Lumada and Pentaho Support Portal.

Place the run file into your environment and execute the file:

sudo sh ldc-agent-<version>.run

The run file will guide you through your remote agent set up. Set up for remote agent will vary if your environment is Kerberos enabled or not.

Non-Kerberos

In this example the remote agent ldc_example is created where:

  • The install location will be /opt/ldc_example.
  • The remote agent will be managed by the service user ldcuser.
  • The remote agent will connect to a Data Catalog cluster that is accessible via NodePort on http://ldc_cluster:31080, where ldc_cluster is the host name and 31080 is the HTTP port.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
            LUMADA DATA CATALOG AGENT INSTALLER  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Express Install (Requires superuser access)
2. Custom Install (Runs with non-sudo access)
3. Upgrade
4. Exit

Enter your choice [1-4]: 1
Enter the name of the Lumada Data Catalog service user [ldcuser]: ldcuser
Enter install location [/opt/ldc]: /opt/ldc_example
Enter log location [/var/log/ldc]: /opt/ldc_example/logs
Enter Appserver endpoint [http://localhost:3000]: http://ldc_cluster:31080
Enter the name of the agent: ldc_example
Enter HIVE version [3.1.2]: 3.1.2
Is Kerberos enabled? [y/N]: N
~~~~~~~~~~~~~~~~~~~~~~~
SELECTION SUMMARY
~~~~~~~~~~~~~~~~~~~~~~~
Lumada Data Catalog service user : ldcuser
Install location : /opt/ldc_example/ldc (will be created)
Log location : /opt/ ldc_example /logs/ldc (will be created)
Kerberos enabled : false AppServer endpoint : http://ldc_cluster:31080
Agent ID : ldc_example
Proceed? [Y/n]: Y

The script will then create the install and log locations. Remote agent configuration can be found in the install location you specified, under the ldc/agent folder.

  1. Switch to the Data Catalog service user that was specified in the remote agent setup, and navigate to your remote agent’s configuration:
    sudo su – ldcuser
    cd /opt/ldc_example/ldc/agent
  2. Once you are in the ldc/agent directory, start the agent: bin/agent start

Kerberos

For setup on a Kerberos environment, additional variables are required by the script – a path to an existing keytab file on the server and the service user principal.

In this example the remote agent ldc_example is created where:

  • The install location will be /opt/ldc_example.
  • The remote agent will be managed by the service user ldcuser. The remote agent will connect to a Data Catalog cluster that is accessible via NodePort on https://ldc_cluster:31083, where ldc_cluster is the host name and 31083 is the HTTPS port.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
            LUMADA DATA CATALOG AGENT INSTALLER  
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
1. Express Install (Requires superuser access)
2. Custom Install (Runs with non-sudo access)
3. Upgrade
4. Exit

Enter your choice [1-4]: 1
Enter the name of the Lumada Data Catalog service user [ldcuser]: ldcuser
Enter install location [/opt/ldc]: /opt/ldc_example
Enter log location [/var/log/ldc]: /opt/ldc_example/logs
Enter Appserver endpoint [http://localhost:3000]: https://ldc_cluster_url:31083
Enter the name of the agent: ldc_example
Enter HIVE version [3.1.2]: 3.1.2
Is Kerberos enabled? [y/N]: Y
Full path to Lumada Data Catalog service user keytab: /home/ldcuser/ldcuser.keytab
Lumada Data Catalog service user’s fully qualified principal: ldcuser@<your company>.com
~~~~~~~~~~~~~~~~~~~~~~~
SELECTION SUMMARY
~~~~~~~~~~~~~~~~~~~~~~~
Lumada Data Catalog service user : ldcuser
Install location : /opt/ldc_example/ldc (will be created)
Log location : /opt/ldc_example/logs/ldc (will be created)
Kerberos enabled : true
Kerberos keytab path : /home/ldcuser/ldcuser.keytab
Kerberos principal : ldcuser@<your company>.com AppServer endpoint : https://ldc_cluster:31083
Agent ID : ldc_example
Proceed? [Y/n]: Y

The script will then create the install and log locations. Remote agent configuration can be found in the install location you specified, under the ldc/agent folder.

  1. Switch to the Data Catalog service user that was specified in the remote agent setup, and navigate to your remote agent’s configuration:
    sudo su – ldcuser
    cd /opt/ldc_example/ldc/agent
  2. During agent set up, a folder keytab has been created with the rest of your agent related files. This folder contains the keytab file that you will use to obtain a Kerberos ticket for the service user principal:
    cd /opt/ldc_example/ldc/agent
    kinit -kt keytab/ldcuser.keytab ldcuser@<your company>.com
  3. Using the openssl command, fetch certificate fingerprints, passing the hostname (and if applicable, the port or path) of the Data Catalog cluster. In this example, the certificate fingerprints will be retrieved from ldc_cluster, where the application is accessible on port 31083:
    openssl s_client -connect ldc_cluster:31083 < /dev/null 2>/dev/null | openssl x509 -fingerprint -sha256 -noout -in /dev/stdin | cut -d'=' -f2 | tr -d : | tr [:upper:] [:lower:]
  4. If executed successfully, the command will return certificate fingerprints in the form of an alpha-numeric string. Use the certificate fingerprints to register the remote agent to the Data Catalog cluster:
    bin/agent register --agent-token null --endpoint wss://ldc_cluster:31083/wsagent --agent-id ldc_example --cert-fingerprint <certificate fingerprints from openssl command>

Authorize remote agents

  1. Once the remote agent has been started/registered, check the remote agent logs and confirm that the agent is running.

    cd /opt/ldc_example/ldc/agent
    bin/agent log -f

    If set up correctly, the logs will include an agent authorization error that will look something like the following:

    [WebSocketClient-SecureIO-1] INFO com.hitachivantara.datacatalog.remoteagent.socket_services.WebsocketSocketService - Disconnected: CloseReason: code [3403], reason [agent not authorized]
  2. To authorize the agent, open your browser and log into the Data Catalog UI, then navigate to Management and click Agents.

    The configured remote agent should now appear in the list of available agents.
    1. Click on the Authorize button for the remote agent.

  3. Go back to your remote agent’s logs, where there should be logs confirming that the remote agent has been authorized. This will include a successful handshake between the agent and Data Catalog cluster, creation of configuration data and then a series of successful “pings” to the agent:

    31 May 2022 14:08:43.784 [pool-10-thread-1] INFO com.hitachivantara.datacatalog.remoteagent.messagehandlers.TokenHandler - Processed handshake from server registered agent: ldc_example
    31 May 2022 14:08:45.750 [pool-10-thread-2] INFO com.hitachivantara.datacatalog.remoteagent.messagehandlers.MetaHealthHandler - Agent: Hi there from mother ship, ping:1654006125726
    […]
    31 May 2022 14:09:42.678 [pool-10-thread-5] INFO com.hitachivantara.datacatalog.remoteagent.messagehandlers.MetaHealthHandler - Agent: ping to ldc_example:1654006182675