Skip to main content

Pentaho+ documentation is moving!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Managing job templates

Parent article

As an owner of a data node or resource, whether you are administrators, data stewards, or analyst, you know your data well and are best suited to make delegation decisions regarding your nodes and resources. Lumada Data Catalog delegates job management based on user roles and guidelines set by your organization.

Administrators can grant specified roles the permission to run jobs that process the data resources assigned to those individual roles. An administrator can create custom job templates for the users to use on their data assets. Or, users with job execution permissions can select from job sequences provided by Data Catalog.

NoteRoles with guest access level are not permitted to run any jobs.

Job templates are pre-defined templates created by the administrators to run specific job sequences that apply to specific clusters. Creating job templates is a method to pass system or Spark-specific parameters as command line arguments for job sequences. Job sequences include algorithms such as driver memory, executor memory, number of threads required based on a cluster size, and overriding the default Data Catalog script parameters like profiling Collections as a single resource. These templates can then be promoted to users handling such clusters for efficient job executions.

Create a new job template

Follow the steps below to create a new job template.

Procedure

  1. Click Management in the left navigation menu.

    The Manage Your Environment page opens.
  2. Click Job Templates on the Job Management card.

    The Job Templates page opens.
  3. Click Add Template.

    Virtual Folder is preselected.
  4. Enter values in the following fields.

    FieldDescription
    NameEnter a name that identifies the job template when shown to those users who can run jobs. Ensure that the name strings do not match any Data Catalog reserved names. This field is required.
    Description(Optional) Enter a brief description of the intended use of this template.
    Asset TypeUse a Virtual Folder for your template to run against.
    Virtual Folder NameEnter the name of the Virtual Folder. Start typing the name in the field and select the best match from the list that appears.
    Asset PathThis field is auto-filled with the absolute path of the value entered in the Virtual Folder Name field.
    Process TypeSelect the sequence processing that you want to use.
  5. Click Incremental Profiling if you want to use incremental processing for the job template.

  6. In the Enter Parameters field, enter any command line parameters for the job template.

  7. Click Create Template.

Results

Your template is created and can be selected by users to trigger jobs from their domains.

Edit a job template

If needed, you can edit a job template. For example, you can edit the template to associate a new virtual folder and reuse the template. However, if you delete the virtual folder associated with a job template, then further execution of the template is prevented.

Perform the following steps to edit a job template.

Procedure

  1. Click Management in the left navigation menu.

    The Manage Your Environment page opens.
  2. Click Job Templates on the Job Management card.

    The Job Templates page opens.
  3. Locate the Template Name that you want to edit then click the View Details icon in the row.

    The Job Template Details page opens.
  4. Edit the template.

    You can only edit the Description, Virtual Folder Name, Asset Path, Process Type, and the Data Catalog profiling and command line parameters. You cannot change the Template Name or Asset Type.
  5. Verify your edits to the template then click Save Template.

Delete a job template

You can delete a job template.

Follow the steps below to delete a job template.

Procedure

  1. Click Management in the left navigation menu.

    The Manage Your Environment page opens.
  2. Click Job Templates on the Job Management card.

    The Job Templates page opens.
  3. Select the check box of one or multiple job templates that you want to delete.

    The Delete Template confirmation dialog box opens.
  4. Enter yes in the Please Confirm field to proceed.

  5. Click Confirm.

Results

The job template is deleted.

Submit a job template for execution

Follow the steps below to submit a job template for execution.

Procedure

  1. Click Management in the left navigation menu.

    The Manage Your Environment page opens.
  2. Click Job Templates on the Job Management card.

    The Job Templates page opens.
  3. Locate the Template Name that you want to submit for execution and click the More actions icon in the row.

  4. Select Start Now from the menu that displays.

Results

The defined job sequence(s) for the resource(s) defined in that template are submitted in the execution queue. For more information, see Run a job template on a resource.

View job template activity

Follow the steps below to view job template activity.

Procedure

  1. Click Management in the left navigation menu.

    The Manage Your Environment page opens.
  2. Click Job Templates on the Job Management card.

    The Job Templates page opens.
  3. Locate the Template Name that you want to submit for execution and click the More actions icon in the row.

  4. Select View Instances from the menu that displays.

Results

The Job Activity page opens, displaying a filtered list of all the instances that were initiated using the selected template.

Job sequences

Job sequences are sequences of jobs in Lumada Data Catalog that can be executed by users who have job execution privileges.

NoteThese sequences execute with predefined parameters in Data Catalog and cannot be overridden by the user with the Sequence option.

Trigger a sequence job

Follow the steps below to run a sequence job for a specific resource.

Procedure

  1. Click Data Canvas in the left navigation menu.

    The Explore Your Data page opens.
  2. Use the Navigation pane to drill down to the resource.

  3. Click More actions and then select Process from the menu that displays.

    The Process Selected Items page opens.
  4. Click the sequence that you want to use.

    SequenceDescription
    Select TemplateA template is a custom definition for a given process with a custom set of parameters.
    Format DiscoveryIdentifies the format of data resources, marking the resources that can be further processed.
    Schema DiscoveryApplies format-specific algorithms to determine the structure of the data in each resource, producing a list of columns or fields for each resource’s catalog entry.
    Collection DiscoveryDiscover collections of data elements with same schema.
    Data ProfilingProfiling applies data-specific logic to compute field-level statistics and patterns for each resource as unique fingerprints of the data.
    Data Profiling ComboStarts a combined sequence of processes to profile your data. Executes format discovery, schema discovery and data profile process.
    Business Term DiscoveryCompares and analyzes the computed fingerprints with any defined or seeded label signatures to discover possible matches.

    Note that users must have Run Term Discovery permissions to run this job.

    Lineage DiscoveryShows relationships among resources in the form of a lineage graph. Data lineage identifies copies of the same data, merges between resources, and the horizontal and vertical subsets of these resources.
    Data RationalizationFinds redundant data copies and overlaps.
    The sequence page opens.
  5. Based on the resource, follow the workflow for the sequence.

  6. Click Incremental Profiling if you want to use incremental processing.

    NoteWhen you select Fast profiling mode in the Sequence flow, the default values for sample-splits and sample-rows are used as defined in the Agent component's configuration.
  7. In the Enter Parameters field, enter any command line parameters for the sequence.

  8. Click Start Now.

Results

The job is submitted to the Data Catalog processing engine.