Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Use Secure Impersonation with Cloudera

Parent article

This article explains how to configure the Pentaho Server to connect to a Cloudera Hadoop 5.9 cluster to use secure impersonation. For an overview of secure impersonation, refer to Setting Up Big Data Security. The following sections will guide you through the setup and configuration process:

  • Prerequisites
  • Parameter Configuration
  • Configuring MapReduce Jobs (Windows-only)
  • Connecting to a Cloudera Impala Database
  • Next Steps

Prerequisites

The following requirements must be met to use secure impersonation:

  • The cluster must be secured with Kerberos, and the Kerberos server used by the cluster must be accessible to the Pentaho Server.
  • The Pentaho computer must have Kerberos installed and configured as explained in Set Up Kerberos for Pentaho.
NoteIf your system has version 8 of the Java Runtime Environment (JRE) or the Java Developer's Kit (JDK) installed, you will not need to install the Kerberos client, since it is included in the Java installation. You will need to modify the Kerberos configuration file, krb5.conf, as specified in the Set Up Kerberos for Pentaho article.
NoteFollow the instructions below for editing the config.properties file below instead of the instructions in the "Edit config.properties (Secured Clusters)" section of the Set up Pentaho to Connect to a Cloudera Cluster article.

Parameter configuration

The mapping types value in the config.properties file turns secure impersonation on or off. The mapping types supported by the Pentaho Server are disabled and simple. When set to disabled or left blank, the Pentaho Server does not use authentication. When set to simple, the Pentaho users can connect to the Hadoop cluster as a proxy user.

To configure the cluster for secure impersonation, stop the Pentaho Server and complete the following steps:

Procedure

  1. Navigate to the pentaho-server\pentaho-solutions\system\kettle\plugins\pentaho-big-data-plugin\hadoop-configurations\chd59 folder and open the config.properties file with a text editor.

  2. Modify the config.properties file with the values in the following table:

    ParameterValue
    pentaho.authentication.default.kerberos.principalexampleUser@EXAMPLE.COM
    pentaho.authentication.default.kerberos.keytabLocationSet the Kerberos keytab. You only need to set the password or the keytab, not both.
    pentaho.authentication.default.kerberos.passwordSet the Kerberos password. You only need to set the password or the keytab, not both.
    pentaho.authentication.default.mapping.impersonation.typesimple
    pentaho.authentication.default.mapping.server.credentials.kerberos.principalexampleUser@EXAMPLE.COM
    pentaho.authentication.default.mapping.server.credentials.kerberos.keytabLocationYou only need to set the password or the keytab, not both.
    pentaho.authentication.default.mapping.server.credentials.kerberos.passwordYou only need to set the password or the keytab, not both.
    pentaho.oozie.proxy.userAdd the proxy user's name if you plan to access the Oozie service through a proxy. Otherwise, leave it set to oozie.

    In this table, exampleUser@EXAMPLE.COM is provided as a sample of how you would specify your proxy user. If you have key-value pairs in your existing config.properties file that are not security related, merge those settings into the file.

  3. Save and close the config.properties file.

  4. Copy the config.properties file to the following folders:

    design-tools/report-designer/plugins/pentaho-big-data-plugin/hadoop-configurations/cdh59/config.properties
    design-tools/metadata-editor/plugins/pentaho-big-data-plugin/hadoop-configurations/cdh59/config.properties
    design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations/cdh59/config.properties
    .
  5. Restart the Pentaho Server

Configuring MapReduce jobs

For Windows systems, you must modify the mapred-site.xml files to run MapReduce jobs with secure impersonation. Complete the following steps to modify the files:

Procedure

  1. Navigate to the design-tools\data-integration\plugins\pentaho-big-data-plugin\hadoop-configurations\cdh59 folder and open the mapred-site.xml file with a text editor.

  2. Navigate to the pentaho-server\pentaho-solutions\system\kettle\plugins\pentaho-big-data-plugin\hadoop-configurations\cdh59 folder and open the mapred-site.xml file with a text editor.

  3. Add the following two properties to the two mapred-site.xml files:

    <property>
      <name>mapreduce.app-submission.cross-platform</name>
      <value>true</value>
    </property>
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
  4. Save and close the files.

Connecting to a Cloudera Impala database

Complete the following steps to connect to a secure Cloudera Impala database:

Procedure

  1. Download the Cloudera Impala JDBC driver for your operating system from the Cloudera web site http://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-29.html.

    NoteSecure impersonation with Impala is only supported with the Cloudera Impala JDBC driver. You may have to create an account with Cloudera to download the driver file.
  2. Extract the ImpalaJDBC41.jar file from the downloaded zip file into the folder pentaho-server/pentaho-solution/system/kettle/plugins/pentaho-big-data-plugin/hadoop-configurations/cdh59/lib. The ImpalaJDBC41.jar file is the only file to extract from the downloaded file.

  3. Connect to a secure CDH cluster.

    If you have not set up a secure cluster, complete the procedure in the article Set up Pentaho to Connect to a Cloudera Cluster to set up a secure cluster.
  4. Start the PDI Client and choose File New Transformation to add a new transformation.

  5. Click the View tab, then right-click Database Connections and choose New.

  6. In the Database Connection dialog box enter the values from the following table:

    FieldValue
    Connection NameUser-defined name
    Connection TypeCloudera Impala
    Host NameHostname
    Database Namedefault
    Port Number21050
  7. Click Options in the left pane of the Database Connection dialog box and enter the parameter values as shown in the following table:

    ParameterValue
    KrbHostFQDNThe fully qualified domain name of the Impala host
    KrbServiceNameThe service principal name of the Impala server
    KrbRealmThe Kerberos realm used by the cluster
  8. Click Test when your settings are entered.

Results

A success message appears if everything was entered correctly.

Next steps

When you save your changes in the repository and your Hadoop cluster is connected to the Pentaho Server, you are now ready to use secure impersonation to run your transformations and jobs from the Pentaho Server.

NoteSecure impersonation from the PDI client is not currently supported.

If you have not yet connected your Hadoop cluster to the Pentaho Server, continue to the "Edit hbase-site.xml" section in Edit the Shim Configuration Files.