Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Group By

Parent article

This step groups rows from a source, based on a specified field or collection of fields. A new row is generated for each group. It can also generate one or more aggregate values for the groups. Common uses are calculating the average sales per product and counting the number of an item you have in stock.

The Group By step is designed for sorted inputs. If your input is not sorted, only double consecutive rows are grouped correctly. If you sort the data outside of PDI, the case sensitivity of the data in the fields may produce unexpected grouping results.

You can use the Memory Group By step to handle non-sorted input.

Select an engine

You can run the Group By step on the Pentaho engine or on the Spark engine. Depending on your selected engine, the transformation will run differently. Select one of the following options to view how to set up the Group By step for your selected engine.

For instructions on selecting an engine from your transformation, see Run configurations.