Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Managing job templates

Parent article

As an owner of a data node or resource, whether you are an administrators, data stewards, or analyst, you know your data well and are best suited to make delegation decisions regarding your nodes and resources. Lumada Data Catalog delegates job management based on user roles and guidelines set by your organization.

Administrators can grant specified roles the permission to run jobs that process the data resources assigned to those individual roles. An administrator can create custom job templates for the users to use on their data assets. Or, users with job execution permissions, can select from job sequences provided by Data Catalog.

NoteRoles with guest access level are not permitted to run any jobs.

Job templates are pre-defined templates created by the administrators to run specific job sequences that apply to specific clusters. Creating job templates is a method to pass system or Spark-specific parameters as command line arguments for job sequences. Job sequences include algorithms such as driver memory, executor memory, number of threads required based on a cluster size, and overriding the default Data Catalog script parameters like profiling Collections as a single resource. These templates can then be promoted to users handling such clusters for efficient job executions.

Job Templates tile

Create a new job template

Follow the steps below to create a new job template.

Procedure

  1. Navigate to Manage and click Job Templates.

  2. Click New.

    Job templates window

    The Job Templates window appears.
  3. Select your asset type which determines the workflow for creating a job template.

    • Virtual Folder
    • Dataset
    Virtual Folder selection
  4. Enter values in the following fields.

    FieldDescription
    Template NameEnter a name that identifies the job template when shown to those users who can run jobs. Ensure that the name strings do not match any Data Catalog reserved names. This field is required.
    Template Description(Optional). Enter a brief description of the intended use of this template.
    Asset TypeYou can select which data asset your template will run against. Data Catalog provides two types of data assets: Virtual Folder and Dataset.
    Asset NameEnter the name of the Virtual Folder or Dataset here. Start typing the name in the field and select the best match from the list that displays.
    Asset PathIf you selected Virtual Folder, then this field is auto-filled with the absolute path of the Asset Name entered above.
  5. Click Next.

  6. In the Sequence pane, select the job sequence that you defined for the template.

    You can choose from the list of Data Catalog jobs sequences that are listed. The jobs are different based on the asset for which you are creating the template: virtual folder or dataset.
    • If you selected Virtual Folder as the asset, then select one from the list that appears in the dialog box and enter any optional command line parameters, such as [-incremental], [-path ] or [-regex], that may apply to the template you are creating. When finished, click Next.

      Virtual Folder sequence

    • If you selected Dataset as the asset, then select one from the list that appears in the dialog box and enter any optional command line parameters, such as [-incremental], that may apply to the template you are creating. When finished, click Next.

      Dataset sequence

    The Review pane shows the template parameters as entered.

    Review pane

  7. Verify your entries and click Create.

Results

Your templates are created and can be selected by users to trigger jobs from their domains.

Edit a job template

If needed, you can edit job templates.
NoteWhen profiling or tag jobs are triggered for data objects, a corresponding template is automatically created and appears in the list of job templates. You can modify and reuse these templates. If you reuse this template for other data objects or virtual folders, be sure to change the command line options.

Perform the following steps to edit a job template.

Procedure

  1. Navigate to Manage, and then click Job Templates.

  2. Click the check box of the job template that you want to edit.

  3. Select the job template, then click the More actions icon in the row. Select Edit from the menu that displays.

    Optionally, you can edit multiple templates. In the menu bar, click the <count> selected link, and then click Edit.

    Edit a job template

    The Job Templates window opens.

    Job Templates window

  4. Edit the template.

    You can only edit the Template Description, Asset Name and the Data Catalog profiling parameters. You cannot change the Template Name, Asset Type, or Sequences.
  5. Verify your edits to the template then click Save.

Delete a job template

You can delete a job template. However, if you delete the virtual folder or dataset associated with a job template, further execution of the template is prevented. You can edit the template to associate a new virtual folder or dataset and reuse the template.

Follow the steps below to delete a job template.

Procedure

  1. Navigate to Manage, then click Job Templates.

    Delete job template

  2. Click the check box of one or multiple job templates that you want to delete.

  3. Select the job template, then click the More actions icon in the row. Select Delete from the menu that displays.

    Optionally, you can delete multiple templates. In the menu bar, click the <count> selected link, and then click Delete.

Results

The job template is deleted.

Submit a job template for execution

Follow the steps below to submit a job template for execution.

Procedure

  1. Navigate to Manage, then click Job Templates.

  2. Click the check box of the job template that you want to submit for execution.

  3. Select the job template, then click the More actions icon in the row. Select Submit from the menu that displays.

    Optionally, you can submit multiple templates. In the menu bar, click the <count> selected link, and then click Submit.

    Submit job template

Results

The defined job sequence(s) for the resource(s) defined in that template are submitted in the execution queue. For more information, see Run a job template on a resource.

View job template activity

Follow the steps below to view job template activity.

Procedure

  1. Navigate to Manage, and then click Job Templates.

  2. Click the check box of the job template for which you want to view job activity.

  3. Select the job template, then click the More actions icon in the row. Select See all instances from the menu that displays.

    Optionally, you can view the activity for multiple templates. In the menu bar, click the <count> selected link, and then click See all instances.

    View job template activity

Results

The Job Activity page opens, displaying a filtered list of all the instances that were initiated using the selected template.

Job sequences

Job sequences are sequences of jobs in Lumada Data Catalog that can be executed by users who have job execution privileges.

NoteThese sequences execute with default parameters in Data Catalog and cannot be overridden by the user with the Sequence option.

While a Job Template allows users to set system parameters and adjust finer control over the Data Catalog script flags, the Sequence execution provides access to the individual job commands with control only over the incremental and mode flags.

Trigger a sequence job

Follow the steps below to run a sequence job for a specific resource.

Procedure

  1. Navigate from Browse to one of the following locations.

    • Resource List
    • Single Resource View
    • Dataset
  2. In the upper-right corner, click the More actions icon and select Run job now from the menu that displays.

    Optionally, you can select a resource, then click the More actions icon to display the actions menu. Click Run job now. Job sequence option path The Run Job Now dialog box opens.
  3. From the Run Job Now dialog box, select Sequence.

    Job sequence menu
  4. Based on the resource, follow the sequence in the dialog box workflow for Virtual Folder or Dataset.

    Virtual folder sequence Dataset sequence
    NoteWhen you select Fast profiling mode in the Sequence flow, the default values for sample-splits and sample-rows are used as defined in the Agent component's configuration.json file.