Run Work Items on Pentaho Worker Nodes
After completing your Pentaho Worker Nodes installation and setup process, your transformations and jobs run as work items and scale out across clusters when your workload increases. You can run work items by using REST API, the PDI client or the User Console. You can also use scheduling options to launch work items during off-peak hours or on a recurring basis. Regardless of the Pentaho application and method used to execute your work items, the processed transformation and job results reside within the Pentaho platform.
These tasks are intended for Pentaho administrators who know where the data is stored, how to connect to it, and details about the computing environment. To run transformations and jobs on Worker Nodes, you must have Pentaho Worker Nodes installed and set up on your system. For more information, see Installing the Pentaho Worker Nodes product and Setting up Pentaho Worker Nodes.
Running work items using REST API
The Content Execution Router service acts as the entry point to Worker Nodes (WorkerNode
). It
provides the REST API for the following requests:
Execute the workItem asynchronously
http://<IP>:<PORT>/api/content/execute/async
Execute the workItem synchronously
http://<IP>:<PORT>/api/content/execute
Get status of execution
http://<IP>:<PORT>/api/content/execute/status?uuid={UUID}
Execute Request REST API
Use the following lists of parameters and values for the execute request.
contentType
The type of content to execute. Possible values are
{ ktr | kjb }
memSize
(Optional) Use to specify container memory size as configured in the PDI Job configuration. Only works with PDI Job. If not specified, the default is used: 512MB, if not changed by user.
cpuSize
(Optional) Use to specify the container cpushare as configured in the PDI Job configuration. Only works with PDI Job. If not specified, the default is used: 0.4, if not changed by user.
repoName
Enterprise or database repository name, if you are using one.
blockRepoConns
This option enables you to prevent logging in to the specified repository, even if the repository connection parameters are provided, assuming you would like to execute a local KTR file instead.
repoUsername
Repository user name
trustRepoUser
Trust the repository user name passed along, such as when no password is required. Requires a preconfigured Pentaho Repository to accept the trusted connections.
repoPassword
Repository password.
inputDir
The directory that contains the KTR/KJB, including the leading slash.
localFile
If you are calling a local KTR/KJB file, this is the file name, including the path if it is not in the local directory.
localJarFile
If you are calling a KTR/KJB within a local JAR file, this is the file name, including the path if it is not in the local directory.
inputFile
The name of the KTR/KJB within the Pentaho Repository to launch.
listRepoFiles
Lists the KTRs/KJBs in the specified repository directory.
listRepoDirs
Lists the directories in the specified repository.
exportRepo
Exports all repository objects to one XML file.
localInitialDir
If the local KTR/KJB file name starts with a scheme such as
zip:
, then you can pass along the initial directory to it.listRepos
Lists the available repositories.
safeMode
Runs in safe mode, which enables extra checking.
metrics
Enables Kettle metric gathering.
listFileParams
List information about the defined named parameters in the specified KTR/KJB.
logLevel
The logging level (Basic, Detailed, Debug, Rowlevel, Error, Nothing).
maxLogLines
The maximum number of log lines that are kept internally by PDI. Set to 0 to keep all rows, which is the default setting.
maxLogTimeout
The maximum age (in minutes) of a log line while being kept internally by PDI. Set to 0 to keep all rows indefinitely, which is the default setting.
oldLogFile
A local file name to which to write the log output.
logFile
If the old logging name is filled in, and the new one is not, overwrite the new logging name with the existing one.
version
Shows the version, revision, and build date.
zip
A base 64 encoded string on a zip file with ktr/kjbs inside it. localFile is needed if this param is used. It is disabled by default.
params
JSON map representation of the parameters to be passed as strings into the executing KTR/KJB. Example:
{ "my-param": "foo", "some-other-param": "bar" }
pluginFolders
(Optional) Comma-separated list of Docker volume name. These folders are used for loading the plugins in addition to .kettle/plugins. The name must be defined as a volume collection in the PDI Job configuration in the Administration Application.
Run work items from the PDI client
Perform the following actions to run work items on worker nodes using the PDI client:
Procedure
Make sure that you are connected to the Pentaho Repository.
Create or edit the run configuration for the transformation or job through the Run configurations folder in the View tab as shown:
-
To create a new run configuration, right-click on the Run Configurations folder and select New, as shown in the folder structure below:
-
To edit an existing run configuration, open the file. Right-click on the Run configuration that you want to change and then select Edit, as shown in the folder structure below:
-
In the Run configuration dialog box, enter or select the options shown in the table below.
The Run configuration dialog box contains the following options when Pentaho is selected as the Engine for running a transformation or job:
Option Description Name Specify the name of the run configuration. Description Optionally, specify details of your configuration. Engine Pentaho Settings Pentaho server Click OK. The PDI client is ready to run the work item using worker nodes.
Next steps
Run work items from the User Console
Perform these initial steps to run work items on worker nodes using the User Console:
Procedure
Connect to the Pentaho Repository from the PDI client, then save your transformation or job file to a folder in the Pentaho Repository.
Log on to the User Console, and then click the Browse Files button.
In the Folders pane, click the folder containing the file that you want to run.
In the File pane, click on the file that you want to run.
Next, proceed according to how and when you want to run the file:
- To run the file immediately, see Run work items in the background.
- To run the file later or on a recurring basis, see Schedule work items to run.
Run work items in the background
Procedure
In File Actions pane, select Run in background. The Run In Background dialog box displays.
Enter your selections for the following options.
- Schedule Name: Specify a name for the schedule, which will also be the name of the generated content.
- Generated Content Location: Specify a location for the generated content.
Click OK. The work item is now running using worker nodes, where content is delivered to your specified location.
Schedule work items to run
Procedure
In File Actions pane, select Schedule. The New Schedule dialog box displays.
Enter your selections for the following options.
- Schedule Name: Specify a name for the schedule, which will also be the name of the generated content.
- Generated Content Location: Specify a location for the generated content.
Click Next.
Customization options for your schedule display. Enter your selections.
Option Description Recurrence Specifies a recurring period in which the file is run. Options include: Run Once
Runs the file one time.
Seconds
Runs the file repeatedly. Specify the Recurrence pattern (in seconds) and the Range of recurrence (Start and End date).
Minutes
Runs the file repeatedly. Specify the Recurrence pattern (in minutes) and the Range of recurrence (Start and End date).
Hours
Runs the file repeatedly. Specify the recurrence pattern (in hours) and the range of recurrence (Start and End date).
Daily
Runs the file repeatedly. Specify the recurrence pattern (in days) and the range of recurrence (Start and End date).
Weekly
Runs the file repeatedly. Specify the recurrence pattern (on the day of every week) and the range of recurrence (Start and End date).
Monthly
Runs the file repeatedly. Specify the Recurrence pattern (on the day of every month) and the range of recurrence (Start and End date).
Yearly
uns the file repeatedly. Specify the recurrence pattern (on the month of the year) and the range of recurrence (Start and End date).
Cron
Runs the file according to Quartz cron attributes. Specify the Cron attributes and the range of recurrence (Start and End date).
Start Time Specify a start time to run the file. Start Date Specify a start date to run the file. Click Finish. The work item will run when scheduled at the specified recurrence. The content will generate to the specified location. To manage scheduled work items in the User Console, see Manage Schedules for details.
Administer the Pentaho Worker Nodes product
After running work items, use the following article to learn how to monitor work items: