Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Work with Rows

A row in PDI is represented by a Java object array, Object[]. Each field value is stored at an index in the row. While the array representation is efficient to pass data around, it is not immediately clear how to determine the field names and types that go with the array. The row array itself does not carry this meta data. Also an object array representing a row usually has empty slots towards its end, so a row can accommodate additional fields efficiently. Consequently, the length of the row array does not equal the amount of fields in the row. The following sections explain how to safely access fields in a row array.

PDI uses internal objects that implement RowMetaInterface to describe and manipulate row structure. Inside processRow() a step can retrieve the structure of incoming rows by calling getInputRowMeta(), which is provided by the BaseStep class. The step clones the RowMetaInterface object and passes it to getFields() of its meta class to reflect any changes in row structure caused by the step itself. Now, the step has RowMetaInterface objects describing both the input and output rows. This illustrates how to use RowMetaInterface objects to inspect row structure. 

There is a similar object that holds information about individual row fields. PDI uses internal objects that implement ValueMetaInterface to describe and manipulate field information, such as field name, data type, format mask, and alike. 

A step looks for the indexes and types of relevant fields upon first execution of processRow(). These methods of RowMetaInterface are useful to achieve this.

Method Purpose
indexOfValue(String valueName) Given a field name, determine the index of the field in the row.
getFieldNames() Returns an array of field names. The index of a field name matches the field index in the row array.
searchValueMeta(String valueName) Given a field name, determine the meta data for the field.
getValueMeta(int index) Given a field index, determine the meta data for the field.
getValueMetaList() Returns a list of all field descriptions. The index of the field description matches the field index in the row array.

If a step needs to create copies of rows, use the cloneRow() methods of RowMetaInterface to create proper copies. If a step needs to add or remove fields in the row array, use the static helper methods of RowDataUtil. For example, if a step is adding a field to the row, call resizeArray(), to add the field. If the array has enough slots, the orignial array is retruned as is. If the array does not have enough slots, a resized copy of the array is returned. If a step needs to create new rows from scratch, use allocateRowData(), which returns a somewhat over-allocated object array to fit the desired number of fields.

Summary Table of Classes and Interfaces for Row Processing

Class/Interface Purpose
RowMetaInterface Describes and manipulates row structure
ValueMetaInterface Describes and manipulates field types and formats
RowDataUtil Allocates space in row array