Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Use Knox to access CDP

Apache Knox is a gateway security tool that provides perimeter security for the Cloudera Data Platform (CDP). Knox provides secure access to the CDP components on a cluster. Connecting to a cluster using Knox provides you with a single point of access to connect to CDP services, eliminating the need to map to each service separately. If your system administrator has implemented Apache Ranger on the cluster, Pentaho will respect the policies your system administrator has set up.

As an example of a Knox deployment, the PDI client connects to Knox using a user ID and password that is registered in LDAP. Knox then authenticates to the Kerberos Key Distribution Center (KDC) with the PDI client user ID and password. Lastly, Knox authorizes with Ranger and submits the request to the cluster.

Knox environment

Setup requirements for Knox with Pentaho

As a system or cluster administrator, you must obtain the following information and provide it to your Pentaho users:

  • Credentials

    The cluster name, gateway URL, username, and password.

  • SSL certificate

    The SSL certificate must be installed. The Knox URL is a secure URL. You need an SSL certificate to successfully perform operations using a Knox gateway. See Configure SSL (HTTPS) in the Pentaho User Console and Server for information on SSL.

  • LDAP directory server

    Authentication with Knox is provided by an LDAP directory server. You must be able to authenticate to an LDAP server. For more information, review the articles Switch to LDAP and LDAP Properties.

Hive configuration with Knox

You can configure your Hive database with Knox.

Procedure

  1. Open the connection to your Hive database, or review the article Set Up a Database Connection for instructions on setting up a connection.

  2. In the Database Connection dialog box, select Options in the page panel on the left to display the Parameters panel.

  3. Enter the following parameters and values in the Options section and click OK.

    ParameterDefinitionValue
    httpPathPath to databasedatahub_cluster_name/cdp-proxy-api/hive, where the datahub_cluster_name and cdp-proxy-api variables depend on your environment.
    knox (Optional)Option to use Knoxtrue
    transportModeConnection protocol to usehttp
    sslOption to use SSLtrue
  4. Enter 443 for the Port Number in the General tab.

Results

You are now ready to use this connection for any Hive steps.