Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Run Work Items on Pentaho Worker Nodes

Parent article

After completing your Pentaho Worker Nodes installation and setup process, your transformations and jobs run as work items and scale out across clusters when your workload increases. You can run work items by using REST API, the PDI client or the User Console. You can also use scheduling options to launch work items during off-peak hours or on a recurring basis. Regardless of the Pentaho application and method used to execute your work items, the processed transformation and job results reside within the Pentaho platform.

These tasks are intended for Pentaho administrators who know where the data is stored, how to connect to it, and details about the computing environment. To run transformations and jobs on Worker Nodes, you must have Pentaho Worker Nodes installed and set up on your system. For more information, see Installing the Pentaho Worker Nodes product and Setting up Pentaho Worker Nodes.

Running work items using REST API

The Content Execution Router service acts as the entry point to Worker Nodes (WorkerNode). It provides the REST API for the following requests:

  • Execute the workItem asynchronously

    http://<IP>:<PORT>/api/content/execute/async

  • Execute the workItem synchronously

    http://<IP>:<PORT>/api/content/execute

  • Get status of execution

    http://<IP>:<PORT>/api/content/execute/status?uuid={UUID}

Execute Request REST API

Use the following lists of parameters and values for the execute request.

Form Parameters (in HTTP body) using x-www-form-urlencoded option

  • contentType

    The type of content to execute. Possible values are { ktr | kjb }

  • memSize

    (Optional) Use to specify container memory size as configured in the PDI Job configuration. Only works with PDI Job. If not specified, the default is used: 512MB, if not changed by user.

  • cpuSize

    (Optional) Use to specify the container cpushare as configured in the PDI Job configuration. Only works with PDI Job. If not specified, the default is used: 0.4, if not changed by user.

ETL execution specific parameters

  • repoName

    Enterprise or database repository name, if you are using one.

  • blockRepoConns

    This option enables you to prevent logging in to the specified repository, even if the repository connection parameters are provided, assuming you would like to execute a local KTR file instead.

  • repoUsername

    Repository user name

  • trustRepoUser

    Trust the repository user name passed along, such as when no password is required. Requires a preconfigured Pentaho Repository to accept the trusted connections.

  • repoPassword

    Repository password.

  • inputDir

    The directory that contains the KTR/KJB, including the leading slash.

  • localFile

    If you are calling a local KTR/KJB file, this is the file name, including the path if it is not in the local directory.

  • localJarFile

    If you are calling a KTR/KJB within a local JAR file, this is the file name, including the path if it is not in the local directory.

  • inputFile

    The name of the KTR/KJB within the Pentaho Repository to launch.

  • listRepoFiles

    Lists the KTRs/KJBs in the specified repository directory.

  • listRepoDirs

    Lists the directories in the specified repository.

  • exportRepo

    Exports all repository objects to one XML file.

  • localInitialDir

    If the local KTR/KJB file name starts with a scheme such as zip:, then you can pass along the initial directory to it.

  • listRepos

    Lists the available repositories.

  • safeMode

    Runs in safe mode, which enables extra checking.

  • metrics

    Enables Kettle metric gathering.

  • listFileParams

    List information about the defined named parameters in the specified KTR/KJB.

  • logLevel

    The logging level (Basic, Detailed, Debug, Rowlevel, Error, Nothing).

  • maxLogLines

    The maximum number of log lines that are kept internally by PDI. Set to 0 to keep all rows, which is the default setting.

  • maxLogTimeout

    The maximum age (in minutes) of a log line while being kept internally by PDI. Set to 0 to keep all rows indefinitely, which is the default setting.

  • oldLogFile

    A local file name to which to write the log output.

  • logFile

    If the old logging name is filled in, and the new one is not, overwrite the new logging name with the existing one.

  • version

    Shows the version, revision, and build date.

  • zip

    A base 64 encoded string on a zip file with ktr/kjbs inside it. localFile is needed if this param is used. It is disabled by default.

  • params

    JSON map representation of the parameters to be passed as strings into the executing KTR/KJB. Example: { "my-param": "foo", "some-other-param": "bar" }

  • pluginFolders

    (Optional) Comma-separated list of Docker volume name. These folders are used for loading the plugins in addition to .kettle/plugins. The name must be defined as a volume collection in the PDI Job configuration in the Administration Application.

Run work items from the PDI client

When using the PDI client, you must set up the run configuration of a transformation or job to run it as a work item. Afterward, you can run those saved work items immediately or schedule the items to run at regular intervals, on certain dates and times, or with different parameters.

Perform the following actions to run work items on worker nodes using the PDI client:

Procedure

  1. Make sure that you are connected to the Pentaho Repository.

  2. Create or edit the run configuration for the transformation or job through the Run configurations folder in the View tab as shown:

    • To create a new run configuration, right-click on the Run Configurations folder and select New, as shown in the folder structure below:

      New run configuration option

    • To edit an existing run configuration, open the file. Right-click on the Run configuration that you want to change and then select Edit, as shown in the folder structure below:Edit run configuration option

  3. In the Run configuration dialog box, enter or select the options shown in the table below.

    Pentaho engine run configuration set to Pentaho        Server

    The Run configuration dialog box contains the following options when Pentaho is selected as the Engine for running a transformation or job:

    OptionDescription
    NameSpecify the name of the run configuration.
    DescriptionOptionally, specify details of your configuration.
    EnginePentaho
    SettingsPentaho server
  4. Click OK. The PDI client is ready to run the work item using worker nodes.

Next steps

NoteTo run work items at specific times, on a recurring basis, with different parameters, or to manage scheduled items, see Schedule perspective in the PDI client for details.

Run work items from the User Console

When using the User Console, you can launch saved work items to run on worker nodes immediately, at a scheduled time, or at a regular interval.

Perform these initial steps to run work items on worker nodes using the User Console:

Procedure

  1. Connect to the Pentaho Repository from the PDI client, then save your transformation or job file to a folder in the Pentaho Repository.

  2. Log on to the User Console, and then click the Browse Files button.

  3. In the Folders pane, click the folder containing the file that you want to run.

  4. In the File pane, click on the file that you want to run.

  5. Next, proceed according to how and when you want to run the file:

    Running work items options in Pentaho User Consule

Run work items in the background

Perform the following steps to run a transformation or job immediately:

Procedure

  1. In File Actions pane, select Run in background. The Run In Background dialog box displays.

    Run in backgorund dialog box
  2. Enter your selections for the following options.

    • Schedule Name: Specify a name for the schedule, which will also be the name of the generated content.
    • Generated Content Location: Specify a location for the generated content.
  3. Click OK. The work item is now running using worker nodes, where content is delivered to your specified location.

Schedule work items to run

Perform the following steps to run a transformation or job on a specific date and time or at a recurring interval:

Procedure

  1. In File Actions pane, select Schedule. The New Schedule dialog box displays.

    Schedule properties in New Schedule dialog box
  2. Enter your selections for the following options.

    • Schedule Name: Specify a name for the schedule, which will also be the name of the generated content.
    • Generated Content Location: Specify a location for the generated content.
  3. Click Next.

  4. Customization options for your schedule display. Enter your selections.

    Setting up schedule in New Schedule dialog box
    OptionDescription
    RecurrenceSpecifies a recurring period in which the file is run. Options include:
    • Run Once

      Runs the file one time.

    • Seconds

      Runs the file repeatedly. Specify the Recurrence pattern (in seconds) and the Range of recurrence (Start and End date).

    • Minutes

      Runs the file repeatedly. Specify the Recurrence pattern (in minutes) and the Range of recurrence (Start and End date).

    • Hours

      Runs the file repeatedly. Specify the recurrence pattern (in hours) and the range of recurrence (Start and End date).

    • Daily

      Runs the file repeatedly. Specify the recurrence pattern (in days) and the range of recurrence (Start and End date).

    • Weekly

      Runs the file repeatedly. Specify the recurrence pattern (on the day of every week) and the range of recurrence (Start and End date).

    • Monthly

      Runs the file repeatedly. Specify the Recurrence pattern (on the day of every month) and the range of recurrence (Start and End date).

    • Yearly

      uns the file repeatedly. Specify the recurrence pattern (on the month of the year) and the range of recurrence (Start and End date).

    • Cron

      Runs the file according to Quartz cron attributes. Specify the Cron attributes and the range of recurrence (Start and End date).

    Start TimeSpecify a start time to run the file.
    Start DateSpecify a start date to run the file.
  5. Click Finish. The work item will run when scheduled at the specified recurrence. The content will generate to the specified location. To manage scheduled work items in the User Console, see Manage Schedules for details.

Administer the Pentaho Worker Nodes product

After running work items, use the following article to learn how to monitor work items: