Using Merge rows (diff) on the Spark engine

Last updated
Save as PDF

You can set up the Merge rows (diff) step to run on the Spark engine. On the Spark engine, Merge rows (diff) compares and merges data from two source datasets into two rows of data, based on key comparison matching. Sorting is not required to produce correct results. Additionally, output from the Merge rows (diff) step is not automatically sorted. When Spark is processing the transformation, the rows are sorted by the group fields. The field names cannot contain spaces, dashes, or special characters. Each field name must start with a letter.

General

Enter the following information for the step:

Step name: Specify the unique name of the transformation on the canvas. You can customize the name or leave it as the default.

Options

Options for Merge rows (diff)

The Merge rows (diff) step contains the following options.

Option	Description
Reference rows origin	From the drop-down list, select the input source for the original reference rows you want to compare. The input source will be a previous step in the transformation.
Compare rows origin	From the drop-down list, select the input source for the compare rows. The input source will be a previous step in the transformation.
Flag fieldname	Specify a field name that will contain the flag indicating how the values were compared and merged in the output row.
Keys to match	Specify the field names that contain the keys on which to generate a match.
Values to compare	Specify the field names that contain the values on which to generate a comparison.
Get key fields	Click the Get key fields button to populate the Keys to match table with all the fields in the Reference rows origin step.
Get value fields	Click the Get value fields button to populate the Values to compare table with all the fields in the Compare rows origin step.

Examples

The Merge rows – mergs 2 streams of data and add a flag.ktr transformation located in the data-integration/samples/transformations directory illustrates how to use the Merge rows (diff) step.

Metadata injection support

All fields of this step support metadata injection. You can use this step with ETL metadata injection to pass metadata to your transformation at runtime.

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com.

General

Options

Examples

Metadata injection support