Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Manual and advanced secure impersonation configuration

Secure impersonation can be implemented when you connect to a Hadoop cluster with the PDI client, depending on the options you select. This article explains optional manual and advanced configurations for secure impersonation on the Pentaho Server. For an overview of secure impersonation, refer to Setting Up Big Data Security.

The following sections guide you through the optional manual setup and advanced configurations:

  • Prerequisites
  • Manually configuring secure impersonation parameters
  • Configuring MapReduce jobs (Windows-only)
  • Connecting to a Cloudera Impala database (Cloudera-only)
  • Next Steps

Prerequisites

The following requirements must be met to use secure impersonation:

  • The cluster must be secured with Kerberos, and the Kerberos server used by the cluster must be accessible to the Pentaho Server.
  • The Pentaho computer must have Kerberos installed and configured. See Set Up Kerberos for Pentaho for instructions.
NoteIf your system has version 8 of the Java Runtime Environment (JRE) or the Java Developer's Kit (JDK) installed, you will not need to install the Kerberos client, since it is included in the Java installation. You will need to modify the Kerberos configuration file, krb5.conf, as specified in the Set Up Kerberos for Pentaho article.

Configuring MapReduce jobs

If you are trying to establish secure impersonation on a Windows system, you must modify the mapred-site.xml file to run MapReduce jobs for secure impersonation.

Perform the following steps to modify the mapred-site.xml file for secure impersonation:

Procedure

  1. Navigate to the <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/<user-defined connection name> directory and open the mapred-site.xml file with a text editor.

  2. Add the following two properties to the mapred-site.xml file:

    <property>
      <name>mapreduce.app-submission.cross-platform</name>
      <value>true</value>
    </property>
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
  3. Save and close the file.

Connecting to a Cloudera Impala database

If you are trying to establish secure impersonation with a Cloudera Hadoop cluster and you are connecting to a secure Cloudera Impala database, you must update security-specific settings on the PDI database connection.

Perform the following steps to update your connection to the secure Cloudera Impala database:

Procedure

  1. Download the Cloudera Impala JDBC driver for your operating system from the Cloudera web site https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-15.html

    NoteSecure impersonation with Impala is only supported with the Cloudera Impala JDBC driver. You may have to create an account with Cloudera to download the driver file.
  2. Extract the ImpalaJDBC41.jar file from the downloaded zip file into the folder <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/cdp71/lib. The ImpalaJDBC41.jar file is the only file to extract from the downloaded file.

  3. Connect to a secure CDP cloud instance.

  4. Start the PDI Client and choose File New Transformation to add a new transformation.

  5. Click the View tab, then right-click Database Connections and choose New.

  6. In the Database Connection dialog box enter the values from the following table:

    FieldValue
    Connection NameUser-defined name
    Connection TypeCloudera Impala
    Host NameHostname
    Database Namedefault
    Port Number443
  7. Click Options in the left pane of the Database Connection dialog box and enter the parameter values as shown in the following table:

    ParameterValue
    KrbHostFQDNThe fully qualified domain name of the Impala host
    KrbServiceNameThe service principal name of the Impala server
    KrbRealmThe Kerberos realm used by the cluster
  8. Click Test when your settings are entered.

Results

A success message appears if everything was entered correctly.

Next steps

When you save your changes in the repository and your Hadoop cluster is connected to the Pentaho Server, you are now ready to use secure impersonation to run your transformations and jobs from the Pentaho Server.

NoteSecure impersonation from the PDI client is not currently supported.

See Set up the Pentaho Server to connect to a Hadoop cluster for instructions on any further advance configurations you may need to perform to connect your Hadoop cluster to the Pentaho Server.