Variables
PDI variables can be used in both Basic concepts of PDI transformation steps and job entries. You define variables with the Set Variable step and Set Session Variables step in a transformation, by hand through the kettle.properties file, or through the Set Environment Variables dialog box in the Edit menu.
The Get Variable and Get
Session Variables steps can explicitly retrieve a value from a variable, or you
can use it in any PDI
text field which has the dollar sign icon next to it by using a metadata string in either the Unix or Windows formats:
${VARIABLE}
%%VARIABLE%%
Both formats can be used and even mixed. In fact, you can create variable
recursion by alternating between the Unix and Windows syntax. For example, if you wanted to
resolve a variable that depends on another variable, then you could use this example:
${%%inner_var%%}
.
You can also use ASCII or hexadecimal character codes in place of variables, using the same format: $[hex value]
. This makes it possible to escape the variable
syntax in instances where you need to put variable-like text into a variable. For instance if you wanted to use ${foobar}
in your data stream, then you can escape it like this:
$[24]{foobar}
. PDI will replace $[24]
with a $
without resolving it as
a variable.
Environment variables
This is the traditional variable type in PDI. You define an environment variable through the Set Environment Variables dialog box in the Edit menu, or by hand by passing it as an option to the Java Virtual Machine (JVM) with the -D flag.
Environment variables are an easy way to specify the location of temporary files in a platform-independent way; for example, the ${java.io.tmpdir} variable points to the /tmp/ directory on Unix/Linux/OS X and to the C:\Documents and Settings\<username\Local Settings\Temp\ directory on Windows.
The only problem with using environment variables is that they cannot be used dynamically. For example, if you run two or more transformations or jobs at the same time on the same application server, you may get conflicts. Changes to the environment variables are visible to all software running on the virtual machine.
Kettle Variables
Kettle variables provide a way to store small pieces of information dynamically in a narrower scope than environment variables. A Kettle variable is local to Kettle, and can be scoped down to the job or transformation in which it is set, or up to a related job. The Set Variable and Set Session Variables steps in a transformation allow you to specify the related job that you want to limit the scope to (for example, the parent job, grandparent job, or the root job).
Kettle variables configure various PDI-specific options such as the location of the shared object file for transformations and jobs or the log size limit. You can set Kettle variables using two methods:
- Set Kettle variables in the PDI client
- Set Kettle variables manually
If you are running a Pentaho MapReduce job, you can also set Kettle and environment variables in the Pentaho MapReduce job entry.
Set Kettle variables in the PDI client
Procedure
In the PDI client, select
.In the Kettle Properties window, modify the variable value.
If you want to add a variable, complete these steps:
Right-click on a line number, then select Insert before this row or Insert after this row.
Enter the variable name and value.
If you want to reposition the variable, right-click on the row number again, then select Move Up or Move Down.
Click the OK button.
Set Kettle variables manually
Procedure
Open the kettle.properties file in a text editor. By default, the kettle.properties file is typically stored in your home directory or the .pentaho directory.
Edit the file.
When complete, close and save the file.
Set Kettle or Java environment variables in the Pentaho MapReduce job entry
To set kettle or java environment variables, complete these steps:
Procedure
In the PDI client, double-click the Pentaho MapReduce job entry, then click the User Defined tab.
In the Name field, set the environment or Kettle variable you need:
- For Kettle environment variables, type the name of the variable in the Name field, like this: KETTLE_SAMPLE_VAR.
- For Java environment variables, preface the value with the
java.system.
prefix, like this: java.system.SAMPLE_PATH_VAR.
Enter the value of the variable in the Value field.
Click the OK button.
Set the LAZY_REPOSITORY variable in the PDI client
Procedure
Open the PDI client, then select
.Look for KETTLE_LAZY_REPOSITORY and, if it is set to false, change the value to true.
Click OK and close the PDI client.
Internal variables
The following variables are always defined:
Variable Name | Sample Value |
Internal.Kettle.Build.Date | 2010/05/22 18:01:39 |
Internal.Kettle.Build.Version | 2045 |
Internal.Kettle.Version | 4.3 |
These variables are defined in a transformation:
Variable Name | Sample Value |
Internal.Transformation.Filename.Directory | D:\Kettle\samples |
Internal.Transformation.Filename.Name | Denormaliser - 2 series of key-value pairs.ktr |
Internal.Transformation.Name | Denormaliser - 2 series of key-value pairs sample |
Internal.Transformation.Repository.Directory | / |
These are the internal variables that are defined in a job:
Variable Name | Sample Value |
Internal.Job.Filename.Directory | file:///home/matt/jobs |
Internal.Job.Filename.Name | Nested jobs.kjb |
Internal.Job.Name | Nested job test case |
Internal.Job.Repository.Directory | / |
These variables are defined in a transformation running on a slave server, executed in clustered mode:
Variable Name | Sample Value |
Internal.Slave.Transformation.Number | 0..<cluster size-1> (0,1,2,3 or 4) |
Internal.Cluster.Size | <cluster size> (5) |