Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Memory Group By

Parent article

The Memory Group By step groups rows in memory from a source step. The resulting rows are grouped based on a specified field or collection of fields. A new row is generated for each group. This step differs from the Group By step by processing all rows in memory, and is designed to handle non-sorted input. If the number of rows you want to group is too large to fit into memory, you must use a combination of the Sort rows and Group By steps.

General

Enter the information in the transformation step name field as shown in the following table:

OptionDefinition
Step NameSpecify the unique name of the Memory Group by step on the canvas. You can customize the name or leave it as the default.
Always give back a result rowSelect to return a result row, even when there is no input row. If no input rows exist, this option returns a count of zero (0).

Memory Group By dialog

The fields that make up the Group Table

Use this table to specify the fields you want to group. Click Get Fields to add all fields from the PDI input stream. Right-click a row in the table to edit that single row or all rows in the table.

NoteLeave this table blank to calculate aggregate functions over the entire dataset.

Aggregates table

Use this table to specify the group you want to aggregate, the aggregation method, and the name of the resulting new field. Click Get lookup fields to specify the fields to include in a group. See the Group By step for more details.

ColumnDescription
NameThe name of the aggregate field.
SubjectThe subject on which you want to use an aggregation method.
Type

The type of aggregation method for grouping the rows. The following methods are available:

  • Sum
  • Average (Mean)
  • Median
  • Percentile
  • Minimum
  • Maximum
  • Number of values (N)
  • Concatenate strings separated by , (comma)
  • First non-null value
  • Last non-null value
  • First value (including null)
  • Last value (including null)
  • Standard deviation
  • Concatenate strings that are separated by the character specified in the Value column
  • Number of distinct values
  • Number of rows (without field argument).
ValueThe aggregate value.

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.