Work with Transformations
In the PDI client (Spoon), you can develop transformations, which are data workflows representing your ETL activities. The steps used in your transformations define the individual ETL activities (building blocks). The transformations containing your steps are stored in .ktr files. You can access these .ktr files through the PDI client.
Create a Transformation
Follow these instructions to create your transformation.
- Perform one of the following actions:
- Click File > New > Transformation.
- Click the New file icon in the toolbar and select Transformation.
- Hold down the CTRL+N keys.
- Go to the Design tab. Expand the folders or use the Steps field to search for a specific steps.
- Either drag or double-click a step to place it on the PDI client canvas.
- Double-click the step in the PDI client canvas to open its properties window. For help on filling out the window, click the Help button that is available with each step.
- To add another step, either drag or double-click the step in the Design tab to place it on the PDI client canvas.
- If you drag the step to the canvas, you can add a hop by pressing the SHIFT key and drawing a hop from one step to the other.
- If you double-click it, the step appears on the canvas with a hop already connected to your previous step.
- When finished, save the transformation.
Open a Transformation
The way you open an existing transformation depends on whether you are using PDI locally on your machine or if you are connected to a repository. If you are connected to a repository, you are remotely accessing your file on the Pentaho Server. Another option is to open a transformation using HTTP with the Visual File System (VFS) Browser.
If you get a message indicating that a plugin is missing, see the Troubleshooting Transformation Steps and Job Entries section for more details.
If you recently had a file open, you can also use File > Open Recent.
On Your Local Machine
Follow these instructions to open a transformation on your local machine.
- In the PDI client, perform one of the following actions:
- Select File > Open.
- Click the Open file icon in the toolbar.
- Hold down the CTRL+O keys.
- Select the file from the Open window, then click Open.
The Open window closes when your transformation appears in the canvas.
In the Pentaho Repository
Follow these instructions to access a transformation in the Pentaho Repository.
- Make sure you are connected to a repository.
- In the PDI client, perform one of the following actions to access the Open repository browser window:
- Select File > Open.
- Click the Open file icon in the toolbar.
- Hold down the CTRL+O keys.
- If you recently opened a file, use Recents to navigate to your transformation.
- Use either the search box to find your transformation, or use the left panel to navigate to a repository folder containing your transformation.
- Perform one of the following actions:
- Double-click on your transformation.
- Select it and press the Enter key.
- Select it and click Open.
The Open window closes when your transformation appears in the canvas.
If you select a folder or file in the Open window, you can click on it again to rename it.
With the VFS Browser
Select File > Open URL to access files using HTTP with the VFS browser. The URL you specify identifies the protocol to use in the browser.
Save a Transformation
The way you save a transformation depends on whether you are using PDI locally on your machine or if you are connected to a repository. If you are connected to a repository, you are remotely saving your file on the Pentaho Server.
On Your Local Machine
Follow these instructions to save a transformation on your local machine.
- In the PDI client, perform one of the following actions:
- Select File > Save.
- Click the Save current file icon in the toolbar.
- Hold down the CTRL+S keys.
If you are saving your transformation for the first time, the Save As window appears.
- Specify the transformation's name in the Save As window and select the location.
- Either press the Enter key or click Save. The transformation is saved.
The Save window closes when your transformation is saved.
- Make sure you are connected to a repository.
- In the PDI client, perform one of the following actions:
- Click File > Save.
- Click the Save current file icon in the toolbar.
- Hold down the CTRL+S keys.
If you are saving your transformation for the first time, the Save repository browser window appears.
- Navigate to the repository folder where you want to save your transformation.
- Specify the transformation's name in the File name field.
- Either press the Enter key or click Save.
The Save window closes when your transformation is saved.
Use the Transformation Menu
Right-click any step in the transformation canvas to view the transformation menu.
Menu Item | Description |
---|---|
New Hop | Creates a new hop. |
Edit | Shows the configuration window for the step. |
Description | Allows you to add a description to the step. |
Open Referenced Object | Allows you to map a sub-transformation. Mapping a sub-transformation is covered in detail in the Reusing Transformation Mapping Flows Between Steps. |
Inspect Data |
Allows you to inspect the data stream of a step once the transformation has run. This option runs your transformation only if it was not previously executed. |
Run and Inspect Data | Runs the transformation up to the selected step, then lets you inspect your data. |
Data Movement | Describes the way data moves through the transformation when there is more than one hop. The following options are available:
|
Change Number of Copies to Start | Starts several instances of a step in parallel. |
Copy | Copies selected items to the clipboard. |
Duplicate | Makes a copy of the selected items, then pastes them to the canvas. |
Delete | Deletes selected items from the canvas. |
Hide |
Hides the step from the PDI client canvas. Caution: if you hide the step, you will need to open the transformation or job XML file and hand edit it to view it again. For more details, see the troubleshooting section. |
Detach | Detaches the step or entry from the transformation or job. |
Input Fields | Shows metadata, like the field name and type, for fields that come into the step. |
Output Fields | Shows metadata, like the field name and type, for fields that go out of the step. |
Sniff Test During Execution |
The sniff test displays data as it travels from one step to another in the stream. To use this, right-click a step in the transformation as it runs and select Sniff Test During Execution. The following options are available:
For more information on how to use this tool, see the Sniff Test Tool article. |
Check Selected Step(s) |
Checks transformation steps for problems that could interfere with successfully running the transformation. Right-click the transformation step that you want to check and click Check Selected Step(s). Warnings and errors appear in the Results of transformation checks window. |
Error Handling | Indicates how to apply error handling for a step. When this is selected, the Step error handling settings window appears. |
Preview | Allows you to preview the results of the transformation. Launches the Transformation Debug Dialog. |
Align/Distribute |
Arranges steps on the canvas so that they are aligned properly or distributed evenly. This helps create a visually pleasing transformation that is easier to read and digest. Align refers to where the steps are permitted along the x (horizontal) or y (vertical) axis. Distribute makes the horizontal and vertical spacing between steps consistent. Typically, you turn on the grid, then move the different steps or entries on the canvas so that they form some sort of pattern, like a straight or branching line. You select steps and apply the following options, as needed:
|
Data Services |
Create, edit, delete, or test a Pentaho Data Service. The Pentaho Data Service allows others to obtain the results of a transformation, even if the person does not have the PDI client or Pentaho Server installed. The Pentaho Data Service is discussed in great detail in Use Pentaho Data Services. |
Mapping |
Provides a way for you to map target fields from the step to source columns in a database. When selected, the Mapping window appears containing the following fields:
When you click OK, the Mapping window closes and a Select / Rename Values step appears on the canvas. It is usually named after the step that right-clicked. The Select/Rename Values window contains the mappings. If you are not able to make mappings, the step still appears, but the properties are blank. |
Partitions | Partitions split data into subsets according to a rule that is applied on a row of data. Partitions are discussed in detail in the Partitioning Data article. |
Clusters | Clusters allow you to create Carte Clusters. For more information, see Using Carte Clusters. |
Run Your Transformation
You can also validate and debug your transformation by running it in the PDI client.
If you are using a Hadoop cluster for big data transformations, see Adaptive Execution Layer (AEL) for how to use AEL to run your transformations with a Spark engine.