Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Manual and advanced secure impersonation configuration

Parent article

Secure impersonation can be implemented when you connect to a Hadoop cluster with the PDI client, depending on the options you select. This article explains optional manual and advanced configurations for secure impersonation on the Pentaho Server. For an overview of secure impersonation, refer to Setting Up Big Data Security.

The following sections guide you through the optional manual setup and advanced configurations:

  • Prerequisites
  • Manually configuring secure impersonation parameters
  • Configuring MapReduce jobs (Windows-only)
  • Connecting to a Cloudera Impala database (Cloudera-only)
  • Next Steps

Prerequisites

The following requirements must be met to use secure impersonation:

  • The cluster must be secured with Kerberos, and the Kerberos server used by the cluster must be accessible to the Pentaho Server.
  • The Pentaho computer must have Kerberos installed and configured. See Set Up Kerberos for Pentaho for instructions.
NoteIf your system has version 8 of the Java Runtime Environment (JRE) or the Java Developer's Kit (JDK) installed, you will not need to install the Kerberos client, since it is included in the Java installation. You will need to modify the Kerberos configuration file, krb5.conf, as specified in the Set Up Kerberos for Pentaho article.

Manually configuring secure impersonation parameters

If you prefer an automated setup and authentication configuration of secure impersonation, you can use the security options while creating a named connection in the PDI client. See Add security to cluster connections for instructions. This section explains how to manually configure secure impersonation if you are not using the PDI client or need more advanced configurations.

The mapping types value in the config.properties file turns secure impersonation on or off. The mapping types supported by the Pentaho Server are disabled and simple. When set to disabled or left blank, the Pentaho Server does not use authentication. When set to simple, the Pentaho users can connect to the Hadoop cluster as a proxy user.

NoteIf you are using these instructions for manually configuring secure impersonation by eduting the config.properties file, you do not need to follow the instructions in the "Edit config.properties (Secured Clusters)" sections of the Hadoop distribution articles listed in Set up the Pentaho Server to connect to a Hadoop cluster.

Perform the following steps to manually set up secure impersonation for your Hadoop cluster and PDI:

Procedure

  1. Stop the Pentaho Server.

  2. Navigate to the <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/<user-defined connection name> directory and open the config.properties file with a text editor.

    NoteThis filepath and the config.properties file are created when you set up your named connection. See Connecting to a Hadoop cluster with the PDI client for instructions.
  3. Modify the config.properties file with the values in the following table:

    ParameterValue
    pentaho.authentication.default.kerberos.principalexampleUser@EXAMPLE.COM
    pentaho.authentication.default.kerberos.keytabLocationSet the Kerberos keytab. You only need to set the password or the keytab, not both.
    pentaho.authentication.default.kerberos.passwordSet the Kerberos password. You only need to set the password or the keytab, not both.
    pentaho.authentication.default.mapping.impersonation.typesimple
    pentaho.authentication.default.mapping.server.credentials.kerberos.principalexampleUser@EXAMPLE.COM
    pentaho.authentication.default.mapping.server.credentials.kerberos.keytabLocationYou only need to set the password or the keytab, not both.
    pentaho.authentication.default.mapping.server.credentials.kerberos.passwordYou only need to set the password or the keytab, not both.
    pentaho.oozie.proxy.userAdd the proxy user's name if you plan to access the Oozie service through a proxy. Otherwise, leave it set to oozie.

    In this table, exampleUser@EXAMPLE.COM is provided as a sample of how you would specify your proxy user. If you have key-value pairs in your existing config.properties file that are not security related, merge those settings into the file.

  4. Save and close the config.properties file.

  5. Restart the Pentaho Server

Configuring MapReduce jobs

If you are trying to establish secure impersonation on a Windows system, you must modify the mapred-site.xml file to run MapReduce jobs for secure impersonation.

Perform the following steps to modify the mapred-site.xml file for secure impersonation:

Procedure

  1. Navigate to the <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/<user-defined connection name> directory and open the mapred-site.xml file with a text editor.

  2. Add the following two properties to the mapred-site.xml file:

    <property>
      <name>mapreduce.app-submission.cross-platform</name>
      <value>true</value>
    </property>
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
  3. Save and close the file.

Connecting to a Cloudera Impala database

If you are trying to establish secure impersonation with a Cloudera Hadoop cluster and you are connecting to a secure Cloudera Impala database, you must update security-specific settings on the PDI database connection.

Perform the following steps to update your connection to the secure Cloudera Impala database:

Procedure

  1. Download the Cloudera Impala JDBC driver for your operating system from the Cloudera web site https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-15.html

    NoteSecure impersonation with Impala is only supported with the Cloudera Impala JDBC driver. You may have to create an account with Cloudera to download the driver file.
  2. Extract the ImpalaJDBC41.jar file from the downloaded zip file into the folder <username>/.pentaho/metastore/pentaho/NamedCluster/Configs/cdh61/lib. The ImpalaJDBC41.jar file is the only file to extract from the downloaded file.

  3. Connect to a secure CDH cluster.

    If you have not set up a secure cluster, complete the procedure in the article Set up Pentaho to Connect to a Cloudera Cluster to set up a secure cluster.
  4. Start the PDI Client and choose File New Transformation to add a new transformation.

  5. Click the View tab, then right-click Database Connections and choose New.

  6. In the Database Connection dialog box enter the values from the following table:

    FieldValue
    Connection NameUser-defined name
    Connection TypeCloudera Impala
    Host NameHostname
    Database Namedefault
    Port Number21050
  7. Click Options in the left pane of the Database Connection dialog box and enter the parameter values as shown in the following table:

    ParameterValue
    KrbHostFQDNThe fully qualified domain name of the Impala host
    KrbServiceNameThe service principal name of the Impala server
    KrbRealmThe Kerberos realm used by the cluster
  8. Click Test when your settings are entered.

Results

A success message appears if everything was entered correctly.

Next steps

When you save your changes in the repository and your Hadoop cluster is connected to the Pentaho Server, you are now ready to use secure impersonation to run your transformations and jobs from the Pentaho Server.

NoteSecure impersonation from the PDI client is not currently supported.

See Set up the Pentaho Server to connect to a Hadoop cluster for instructions on any further advance configurations you may need to perform to connect your Hadoop cluster to the Pentaho Server.