Variables

Last updated
Save as PDF

PDI variables can be used in both Basic concepts of PDI transformation steps and job entries. You define variables with the Set Variable step and Set Session Variables step in a transformation, by hand through the kettle.properties file, or through the Set Environment Variables dialog box in the Edit menu.

The Get Variable and Get Session Variables steps can explicitly retrieve a value from a variable, or you can use it in any PDI text field which has the dollar sign Diamond Dollar Sign icon next to it by using a metadata string in either the Unix or Windows formats:

${VARIABLE}
%%VARIABLE%%

Both formats can be used and even mixed. In fact, you can create variable recursion by alternating between the Unix and Windows syntax. For example, if you wanted to resolve a variable that depends on another variable, then you could use this example: ${%%inner_var%%}.

NoteIf there is a name collision with a parameter or argument, variables will defer.

You can also use ASCII or hexadecimal character codes in place of variables, using the same format: $[hex value]. This makes it possible to escape the variable syntax in instances where you need to put variable-like text into a variable. For instance if you wanted to use ${foobar} in your data stream, then you can escape it like this: $[24]{foobar}. PDI will replace $[24] with a $ without resolving it as a variable.

Environment variables

This is the traditional variable type in PDI. You define an environment variable through the Set Environment Variables dialog box in the Edit menu, or by hand by passing it as an option to the Java Virtual Machine (JVM) with the -D flag.

Environment variables are an easy way to specify the location of temporary files in a platform-independent way; for example, the ${java.io.tmpdir} variable points to the /tmp/ directory on Unix/Linux/OS X and to the C:\Documents and Settings\<username\Local Settings\Temp\ directory on Windows.

The only problem with using environment variables is that they cannot be used dynamically. For example, if you run two or more transformations or jobs at the same time on the same application server, you may get conflicts. Changes to the environment variables are visible to all software running on the virtual machine.

Kettle Variables

Kettle variables provide a way to store small pieces of information dynamically in a narrower scope than environment variables. A Kettle variable is local to Kettle, and can be scoped down to the job or transformation in which it is set, or up to a related job. The Set Variable and Set Session Variables steps in a transformation allow you to specify the related job that you want to limit the scope to (for example, the parent job, grandparent job, or the root job).

Kettle variables configure various PDI-specific options such as the location of the shared object file for transformations and jobs or the log size limit. You can set Kettle variables using two methods:

Set Kettle variables in the PDI client
Set Kettle variables manually

If you are running a Pentaho MapReduce job, you can also set Kettle and environment variables in the Pentaho MapReduce job entry.

Set Kettle variables in the PDI client

To set Kettle variables in the PDI client (Spoon), complete these steps.

Procedure

In the PDI client, select Edit Edit the kettle.properties file.
In the Kettle Properties window, modify the variable value.
If you want to add a variable, complete these steps:
1. Right-click on a line number, then select Insert before this row or Insert after this row.
2. Enter the variable name and value.
3. If you want to reposition the variable, right-click on the row number again, then select Move Up or Move Down.
Click the OK button.

Set Kettle variables manually

To edit Kettle variables manually, complete these steps.

Procedure

Open the kettle.properties file in a text editor. By default, the kettle.properties file is typically stored in your home directory or the .pentaho directory.
Edit the file.
When complete, close and save the file.

Set Kettle or Java environment variables in the Pentaho MapReduce job entry

Pentaho MapReduce jobs are typically run in distributed fashion, with the mapper, combiner, and reducer run on different nodes. If you need to set a Java or Kettle environment variable for the different nodes, such as the KETTLE_MAX_JOB_TRACKER_SIZE, set them in the Pentaho MapReduce job entry window.

NoteValues for Kettle environment variables set in the Pentaho MapReduce window override the Kettle environment variable values in the kettle.properties file.

To set kettle or java environment variables, complete these steps:

Procedure

In the PDI client, double-click the Pentaho MapReduce job entry, then click the User Defined tab.
In the Name field, set the environment or Kettle variable you need:
- For Kettle environment variables, type the name of the variable in the Name field, like this: KETTLE_SAMPLE_VAR.
- For Java environment variables, preface the value with the java.system. prefix, like this: java.system.SAMPLE_PATH_VAR.
Enter the value of the variable in the Value field.
Click the OK button.

Set the LAZY_REPOSITORY variable in the PDI client

This variable restores the directory-loading behavior of the repository to be as it was before Pentaho 6.1. To set the LAZY_REPOSITORY variable in the PDI client, complete these steps.

NoteChanging this variable to false will make repository loading more expensive.

Procedure

Open the PDI client, then select Edit Edit the kettle.properties file.
Look for KETTLE_LAZY_REPOSITORY and, if it is set to false, change the value to true.
Click OK and close the PDI client.

Internal variables

The following variables are always defined:

Variable Name	Sample Value
Internal.Kettle.Build.Date	`2010/05/22 18:01:39`
Internal.Kettle.Build.Version	`2045`
Internal.Kettle.Version	`4.3`

These variables are defined in a transformation:

Variable Name	Sample Value
Internal.Transformation.Filename.Directory	D:\Kettle\samples
Internal.Transformation.Filename.Name	Denormaliser - 2 series of key-value pairs.ktr
Internal.Transformation.Name	`Denormaliser - 2 series of key-value pairs sample`
Internal.Transformation.Repository.Directory	/

These are the internal variables that are defined in a job:

Variable Name	Sample Value
Internal.Job.Filename.Directory	file:///home/matt/jobs
Internal.Job.Filename.Name	Nested jobs.kjb
Internal.Job.Name	`Nested job test case`
Internal.Job.Repository.Directory	/

These variables are defined in a transformation running on a slave server, executed in clustered mode:

Variable Name	Sample Value
Internal.Slave.Transformation.Number	`0..<cluster size-1> (0,1,2,3 or 4)`
Internal.Cluster.Size	`<cluster size> (5)`

NoteIn addition to the above, there are also System parameters, including command line arguments. These can be accessed using the Get System Info step in a transformation.

NoteAdditionally, you can specify values for variables in the Execute a transformation dialog box. If you include the variable names in your transformation they will appear in this dialog box.

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com.

Environment variables

Kettle Variables

Set Kettle variables in the PDI client

Set Kettle variables manually

Set Kettle or Java environment variables in the Pentaho MapReduce job entry

Set the LAZY_REPOSITORY variable in the PDI client

Internal variables