Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Hadoop to PDI Data Type Conversion

The Hadoop Job Executor and Pentaho MapReduce steps have an advanced configuration mode that enables you to specify data types for the job's input and output. PDI is unable to detect foreign data types on its own; therefore you must specify the input and output data types in the Job Setup tab. This table explains the relationship between Hadoop data types and their PDI equivalents.

PDI (Kettle) Data Type Apache Hadoop Data Type
org.apache.hadoop​.io.IntWritable java.lang.Long
java.lang.String org.apache.hadoop​.io.IntWritable​.LongWritable​.Text​.LongWritable java.lang.Long

For more information on configuring Pentaho MapReduce to convert to additional data types, see