Running a Job
After creating a job to orchestrate your ETL activities (such as your transformations), you should run it in the PDI client to test how it performs in various scenarios. With the Run Options window, you can apply and adjust different run configurations, options, parameters, and variables. By defining multiple run configurations, you have a choice of running your job locally or on a server using the Pentaho engine.
When you are ready to run your job, you can perform any of the following actions to access the Run Options window:
- Click the Run icon on the toolbar.
- Select Run from the Action menu.
- Press F9.
The Run Options window appears.
In the Run Options window, you can specify a Run configuration to define whether the job runs locally, on the Pentaho Server, or on a slave (remote) server. To set up run configurations, see Run Configurations.
The default Pentaho local configuration runs the job using the Pentaho engine on your local machine. You cannot edit this default configuration.
The Run Options window also lets you specify logging and other options, or experiment by passing temporary values for defined parameters and variables during each iterative run.
Always show dialog on run is set by default. You can deselect this option if you want to use the same run options every time you execute your job. After you have selected to not Always show dialog on run, you can access it again through the dropdown menu next to the Run icon in the toolbar, through the Action main menu, or by pressing F8.
Some ETL activities are lightweight, such as loading in a small text file to write out to a database or filtering a few rows to trim down your results. For these activities, you can run your job locally using the default Pentaho engine. Some ETL activities are more demanding, containing many entries and steps calling other entries and steps or a network of modules. For these activities, you can set up a separate Pentaho Server dedicated for running jobs and transformations using the Pentaho engine.
You can create or edit these configurations through the Run configurations folder in the View tab as shown below:
To create a new run configuration, right-click on the Run Configurations folder and select New, as shown in the folder structure below:
To edit or delete a run configuration, right-click on an existing configuration, as shown in the folder structure below:
Pentaho local is the default run configuration. It runs jobs with the Pentaho engine on your local machine. You cannot edit this default configuration.
Selecting New or Edit opens the Run configuration dialog box that contains the following fields:
|Name||Specify the name of the run configuration.|
|Description||Optionally, specify details of your configuration.|
Select the Pentaho engine to run your job in the default Pentaho (Kettle) environment.
The Spark engine is used for running transformations only, and is not available for running jobs.
The Settings section of the Run configuration dialog box contains the following options when Pentaho is selected as the Engine for running a job:
|Local||Select this option to use the Pentaho engine to run a job on your local machine.|
Select this option to run your job on the Pentaho Server. This option only appears if you are connected to a Pentaho Repository.
|Slave server||Select this option to send your job to a slave or remote server.|
|Location||If you select Slave server, specify the location of your slave or remote server.|
|Send resources to the server||If you specified a Location for a server, select to send your job to the specified server before running it. Select this option to run the job locally on the server. Any related resources, such as other referenced files, are also included in the information sent to the server.|
Errors, warnings, and other information generated as the job runs are stored in logs. You can specify how much information is in a log and whether the log is cleared each time through the Options section of this window. You can also enable safe mode and specify whether PDI should gather performance metrics. Logging and Monitoring Operations describes the logging methods available in PDI.
|Clear log before running||Indicates whether to clear all your logs before you run your job. If your log is large, you might need to clear it before the next execution to conserve space.|
|Log level||Specifies how much logging is performed and the amount of information captured:
Debug and Row Level logging levels contain information you may consider too sensitive to be shown. Please consider the sensitivity of your data when selecting these logging levels. Performance Monitoring and Logging describes how best to use these logging methods.
|Enable safe mode||Checks every row passed through your job and ensure all layouts are identical. If a row does not have the same layout as the first row, an error is generated and reported.|
|Start job at||Specifies an alternative starting entry for your job. All the current entries in your job are listed as options in the dropdown menu.|
|Gather performance metrics||Monitors the performance of your job execution through these metrics. Using Performance Graphs shows how to visually analyze these metrics.|
Parameters and Variables
You can temporarily modify parameters and variables for each execution of your job to experimentally determine their best values. The values you enter into these tables are only used when you run the job from the Run Options window. The values you originally defined for these parameters and variables are not permanently changed by the values you specify in these tables.
|Parameters||Set parameter values related to your job during runtime. A parameter is a local variable. The parameters you define while creating your job are shown in the table under the Parameters tab.
|Set values for user-defined and environment variables related to your job during runtime.|