Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Work with Transformations

In the PDI client (Spoon), you can develop transformations, which are data workflows representing your ETL activities. The steps used in your transformations define the individual ETL activities (building blocks). The transformations containing your steps are stored in .ktr files. You can access these .ktr files through the PDI client.

Create a Transformation

Follow these instructions to create your transformation.

  1. Perform one of the following actions:
  • Click File > New > Transformation.
  • Click the New file icon in the toolbar and select Transformation.
  • Hold down the CTRL+N keys.
  1. Go to the Design tab. Expand the folders or use the Steps field to search for a specific steps.
  2. Either drag or double-click a step to place it on the PDI client canvas.
  3. Double-click the step in the PDI client canvas to open its properties window. For help on filling out the window, click the Help button that is available with each step.
  4. To add another step, either drag or double-click the step in the Design tab to place it on the PDI client canvas.
  • If you drag the step to the canvas, you can add a hop by pressing the SHIFT key and drawing a hop from one step to the other.
  • If you double-click it, the step appears on the canvas with a hop already connected to your previous step.
  1. When finished, save the transformation.

Open a Transformation

The way you open an existing transformation depends on whether you are using PDI locally on your machine or if you are connected to a repository. If you are connected to a repository, you are remotely accessing your file on the Pentaho Server. Another option is to open a transformation using HTTP with the Visual File System (VFS) Browser.

If you get a message indicating that a plugin is missing, see the Troubleshooting Transformation Steps and Job Entries section for more details.

If you recently had a file open, you can also use File > Open Recent.

On Your Local Machine

Follow these instructions to open a transformation on your local machine.

  1. In the PDI client, perform one of the following actions:
  • Select File > Open.
  • Click the Open file icon in the toolbar.
  • Hold down the CTRL+O keys.
  1. Select the file from the Open window, then click Open.

The Open window closes when your transformation appears in the canvas.

In the Pentaho Repository

Follow these instructions to access a transformation in the Pentaho Repository.

  1. Make sure you are connected to a repository.
  2. In the PDI client, perform one of the following actions to access the Open repository browser window:
  • Select File > Open.
  • Click the Open file icon in the toolbar.
  • Hold down the CTRL+O keys.
  1. If you recently opened a file, use Recents to navigate to your transformation.
  2. Use either the search box to find your transformation, or use the left panel to navigate to a repository folder containing your transformation.
  3. Perform one of the following actions:
  • Double-click on your transformation.
  • Select it and press the Enter key.
  • Select it and click Open.

The Open window closes when your transformation appears in the canvas.

If you select a folder or file in the Open window, you can click on it again to rename it.

With the VFS Browser

Select File > Open URL to access files using HTTP with the VFS browser. The URL you specify identifies the protocol to use in the browser.

Learn more

Save a Transformation

The way you save a transformation depends on whether you are using PDI locally on your machine or if you are connected to a repository. If you are connected to a repository, you are remotely saving your file on the Pentaho Server.

On Your Local Machine

Follow these instructions to save a transformation on your local machine.

  1. In the PDI client, perform one of the following actions:
  • Select File > Save.
  • Click the Save current file icon in the toolbar.
  • Hold down the CTRL+S keys.

If you are saving your transformation for the first time, the Save As window appears.

  1. Specify the transformation's name in the Save As window and select the location.
  2. Either press the Enter key or click Save. The transformation is saved.

The Save window closes when your transformation is saved.

In the Pentaho Repository

Follow these instructions to save a transformation to the Pentaho Repository.

  1. Make sure you are connected to a repository.
  2. In the PDI client, perform one of the following actions:
  • Click File > Save.
  • Click the Save current file icon in the toolbar.
  • Hold down the CTRL+S keys.

If you are saving your transformation for the first time, the Save repository browser window appears.

  1. Navigate to the repository folder where you want to save your transformation.
  2. Specify the transformation's name in the File name field.
  3. Either press the Enter key or click Save.

The Save window closes when your transformation is saved.

Adjust Transformation Properties

You can adjust the parameters, logging options, dates, dependencies, monitoring, settings, and data services for transformations. To view the transformation properties, click the CTRL+T or right-click on the canvas and select Properties from the menu that appears.

Use the Transformation Menu

Right-click any step in the transformation canvas to view the transformation menu.

Menu Item Description
New Hop Creates a new hop.
Edit Shows the configuration window for the step.
Description Allows you to add a description to the step.
Open Referenced Object Allows you to map a sub-transformation. Mapping a sub-transformation is covered in detail in the Reusing Transformation Mapping Flows Between Steps.
Inspect Data

Allows you to inspect the data stream of a step once the transformation has run. 

This option runs your transformation only if it was not previously executed.

Run and Inspect Data Runs the transformation up to the selected step, then lets you inspect your data.
Data Movement Describes the way data moves through the transformation when there is more than one hop. The following options are available:
  • Round Robin - Partitions the output stream and sends a portion of all output records down each hop.
  • Load Balance - Checks the output row sets to see how much room is left in the buffer. It selects the one that is most empty. If the rows are distributed to steps that take very little processing time per row (or the exact same amount of time for each step to process a row), Load Balancing is identical to Round Robin. If the rows are sent down one path that takes a long time to process, such as Sort or Group By and down another path that processes rows more quickly, the "quick path" will likely have more rows sent to it, as it will empty its buffer before the "slow path" has a chance to empty its buffer. This is typically used for clustered transformations, where the same processing occurs on different nodes. The row buffer is set, by default, to 10000. To change the row buffer size, open the Transformation Settings window, then select Nr of rows in rowset on the Miscellaneous tab.

  • Copy Data to Next Steps - Copies the data to subsequent steps.
Change Number of Copies to Start Starts several instances of a step in parallel.
Copy Copies selected items to the clipboard.
Duplicate Makes a copy of the selected items, then pastes them to the canvas.
Delete Deletes selected items from the canvas.
Hide

Hides the step from the PDI client canvas.

Caution: if you hide the step, you will need to open the transformation or job XML file and hand edit it to view it again. For more details, see the troubleshooting section.

Detach Detaches the step or entry from the transformation or job.
Input Fields Shows metadata, like the field name and type, for fields that come into the step.
Output Fields Shows metadata, like the field name and type, for fields that go out of the step.
Sniff Test During Execution

The sniff test displays data as it travels from one step to another in the stream. To use this, right-click a step in the transformation as it runs and select Sniff Test During Execution. The following options are available:

  • Sniff test input rows - Shows the data inputted into the step.
  • Sniff test output rows - Shows the data outputted from the step.
  • Sniff test error handling - Shows error handling data.

For more information on how to use this tool, see the Sniff Test Tool article.

Check Selected Step(s)

Checks transformation steps for problems that could interfere with successfully running the transformation. Right-click the transformation step that you want to check and click Check Selected Step(s). Warnings and errors appear in the Results of transformation checks window.

Error Handling Indicates how to apply error handling for a step. When this is selected, the Step error handling settings window appears.
Preview Allows you to preview the results of the transformation. Launches the Transformation Debug Dialog.
Align/Distribute

Arranges steps on the canvas so that they are aligned properly or distributed evenly. This helps create a visually pleasing transformation that is easier to read and digest.

Align refers to where the steps are permitted along the x (horizontal) or y (vertical) axis. Distribute makes the horizontal and vertical spacing between steps consistent. Typically, you turn on the grid, then move the different steps or entries on the canvas so that they form some sort of pattern, like a straight or branching line.

You select steps and apply the following options, as needed:

  • Align Left - Positions all steps so their left sides start on the same "x" (horizontal) coordinate as the left-most step. After applied, steps are arranged in a straight vertical line. No changes are made to the spaces between steps.
  • Align Right - Positions all steps so their right sides start on the same "x" (horizontal) coordinate as the right-most step. After applied, steps are arranged in a straight vertical line. No changes are made to the spaces between steps.
  • Align Top - Positions all steps so their top sides start on the same "y" (vertical) coordinate as the step positioned closest to the top of the canvas. After applied, steps are arranged in a straight horizontal line. No changes are made to the spaces between steps.
  • Align Bottom - Positions all steps so their bottom sides start on the same "y" (vertical) coordinate as the step positioned closest to the bottom of the canvas. After applied, steps are arranged in a straight horizontal line. No changes are made to the spaces between steps.
  • Distribute Horizontally - Positions all steps so that they are evenly spaced horizontally. After applied, steps are arranged evenly. No changes are made to the alignment.
  • Distribute Vertically - Positions all steps so that they are evenly spaced vertically. After applied, steps are arranged evenly. No changes are made to the alignment.
  • Snap to Grid - Aligns steps on the canvas to the grid. If grid markers do not appear on the canvas, select Tools > Options > Look & Feel > Show Canvas Grid. See Customize PDI Client Options for more information on how to customize the PDI client.
Data Services

Create, edit, delete, or test a Pentaho Data Service. The Pentaho Data Service allows others to obtain the results of a transformation, even if the person does not have the PDI client or Pentaho Server installed. The Pentaho Data Service is discussed in great detail in Use Pentaho Data Services.

Mapping

Provides a way for you to map target fields from the step to source columns in a database. When selected, the Mapping window appears containing the following fields:

  • Source Fields - Lists the field names from the incoming stream.
  • Target Fields - Lists the column names in a target table.
  • Auto Target Selection - Automatically selects a matching table column if the target field is selected.
  • Auto Source Select - Automatically selects a matching target field if the table column is selected.
  • Add - Allows you to move the mapped target and source information to the mappings grid.
  • Guess - Makes mappings based on a computer algorithm.
  • Hide assigned source fields and Hide assigned target fields - Removes mappings from the Source Fields and Target Fields lists those fields are added to the mapping grid.
  • Delete - Removes mappings from the mapping grid so that they reappear in the Target Fields and Source Fields lists again.

When you click OK, the Mapping window closes and a Select / Rename Values step appears on the canvas. It is usually named after the step that right-clicked. The Select/Rename Values window contains the mappings. If you are not able to make mappings, the step still appears, but the properties are blank.

Partitions Partitions split data into subsets according to a rule that is applied on a row of data. Partitions are discussed in detail in the Partitioning Data article.
Clusters Clusters allow you to create Carte Clusters. For more information, see Using Carte Clusters.

Run Your Transformation

You can also validate and debug your transformation by running it in the PDI client.

Learn more

If you are using a Hadoop cluster for big data transformations, see Adaptive Execution Layer (AEL) for how to use AEL to run your transformations with a Spark engine.