Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Using Pan and Kitchen with a Hadoop cluster

Parent article

To use Pan or Kitchen on a Hadoop cluster, you must configure Pentaho to run transformations and jobs with either the PDI client or the Pentaho Server. However, these configurations are not needed if your PDI client is connected to the Pentaho Repository. To use Pan and Kitchen from a repository directly on the Pentaho Server, you must create the named cluster definition in the server's repository. See Connect to a Hadoop cluster with the PDI client for information on creating that connection.

NoteIf a user starts the PDI client and the Pentaho Server on the same platform, the cluster configuration files in the /home/<user>/.pentaho/metastore directory are overwritten. To avoid this issue, use the same cluster connection names on both the PDI client host and the Pentaho Server host.

Using the PDI client

Perform the following steps to configure the PDI client host machine to run jobs or transformations from the command line interface (CLI).

Procedure

  1. Create a connection to the Hadoop cluster where you want to run your job or transformation.

  2. Create and test the job or transformation in the PDI client to verify it works as expected.

  3. Navigate to the design-tools/data-integration/plugins/pentaho-big-data-plugin directory and open the plugin.properties file with any text editor.

  4. Set the value of the hadoop.configurations.path property to the location of the metastore directory, such as hadoop.configurations.path=/home/<user>/.pentaho.

    The metastore directory is created when you set up a named connection to the Hadoop cluster. The default metastore location for the PDI client is home/<user>/.pentaho/metastore.
  5. 5. Save and close the plugin.properties file.

Using the Pentaho Server

To run Pan or Kitchen on your Hadoop cluster, the Pentaho Server must have access to the metastore where the Hadoop connections are stored.

Perform the following steps to configure the Pentaho Server to run jobs or transformations from the command line interface (CLI).

Procedure

  1. If the server is on a different host than the PDI client, copy the metastore directory and its contents from the PDI client to a location accessible to the server.

    The metastore directory is created when you set up a named connection to the Hadoop cluster. The default metastore location for the PDI client is home/<user>/.pentaho/metastore.
  2. Navigate to the pentaho-server/pentaho-solutions/system/kettle/plugins/pentaho-big-data-plugin directory and open the plugin.properties file with any text editor.

  3. Set the value of the hadoop.configurations.path property to the metastore directory.

  4. Save and close the plugin.properties file.