Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at


Hitachi Vantara Lumada and Pentaho Documentation

Unique Rows

Parent article

The Unique Rows step removes duplicate rows from the input stream and filters only the unique rows as input data for the step.

Select an engine

You can run the Unique Rows step on the Pentaho engine or on the Spark engine.

The input stream must be sorted in a step prior to the Unique Rows step; otherwise, only consecutive double rows will be correctly analyzed and filtered. However, the rows do not have to be pre-sorted if you use the Unique Rows (HashSet) step, or if you use the Spark engine (Spark Engine) to run the transformation.

Depending on your selected engine, the transformation runs differently. Select one of the following options to view how to set up the Unique Rows step for your selected engine:

For instructions on selecting an engine for your transformation, see Run configurations