Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

PDI Hadoop Job Workflow

PDI enables you to execute a Java class from within a PDI/Spoon job to perform operations on Hadoop data. The way you approach doing this is similar to the way you would for any other PDI job. The specifically-designed job entry that handles the Java class is Hadoop Job Executor. In this illustration it is used in the WordCount - Advanced entry.


The Hadoop Job Executor dialog box enables you to configure the entry with a jar file that contains the Java class.


If you are using the Amazon Elastic MapReduce (EMR) service, you can Amazon EMR Job Executor. job entry to execute the Java class This differs from the standard Hadoop Job Executor in that it contains connection information for Amazon S3 and configuration options for EMR.