Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Managing jobs

Parent article

In Lumada Data Catalog, administrators can delegate data processing jobs to user roles. Depending on your role, you can perform administrative data cataloging and processing functions like profiling and tag propagation on data nodes.

For sources you can access based on your role and the permissions set by your system administrator, you can run a job either with a job template or with a job sequence. You can select sequences that Data Catalog provides, or if you have a privileged role, you can use custom job templates with your data assets.

  • Templates

    You can use administrator-created job templates to run job sequences that apply to specific clusters. Job templates have system or Spark-specific parameters as command line arguments for the job sequences, such as driver memory, executor memory, or number of threads required based on a cluster size. You can override the default Data Catalog parameters. For example, you can set the incremental profile to false, profile a Collection as a single resource, or force a full profile instead of the default sampling option.

    Contact your system administrator to determine the template that is best suited for your data cluster.

  • Sequences

    You can use Data Catalog's job sequences to execute jobs. These jobs are executed with default parameters, and you cannot use the Sequence option to override the default parameters.

Guest-level users cannot run jobs. Stewards and analysts can only run jobs if the administrator has enabled job execution for their roles.

Run a job template on a resource

You can use a template when you run a Lumada Data Catalog job. A template is a custom definition for a given sequence, which may have a custom set of parameters.

For example, if you have a template with job format for asset path /DS1/virtualFolder/VFA and a custom parameter set [-X -Y -Z] is used to run the same job against a resource in /DS2/virtualFolder/VFB, only the asset path in the applied template is updated internally to reflect that of VFB.

The same applies to a dataset template. If you use a template with job format for an asset DSet1 and custom parameter set [-X -Y -Z] to run the same job against a resource in DSet2, only the asset name in the applied template is updated internally to reflect that of DSet2. To run a job template against a resource such as a virtual folder, dataset, or a single resource, follow the steps below.

Procedure

  1. You need to run a job against a resource. From the main Data Catalog dashboard, click Browse Folders and drill down to the resource.

  2. Click More actions and then select Run job now from the menu that displays.

    Run job now menu option The Run Job Now dialog box opens.Run Job Now dialog box
  3. On the Job page, select Template and then click Next.

    The job templates display. Note that the selected resource determines the available job templates.

    Template example for dataset

  4. On the Sequence/Template page, select the check box next to the template that you want to run and click Next.

    The Review page appears.Review page example for template
  5. Review the job template you selected, and click Submit Job.

Results

The job is submitted to the Data Catalog processing engine.

Run a job sequence on a resource

You can use a sequence when you run a Data Catalog job.

NoteJob sequences will run with default parameters, so be mindful of running sequences on large data, which may require additional system or functional parameters for the job to run successfully. Contact your system administrator for Spark parameters.
To run a job sequence against a resource such as a virtual folder, dataset, or a single resource, follow the steps below:

Procedure

  1. You need to run a job against a resource. From the main Data Catalog dashboard, click Browse Folders and drill down to the resource.

  2. Click More actions and then select Run job now from the menu that displays.

    Run job now menu option The Run Job Now dialog box opens.Run Job Now dialog box
  3. On the Job page, select Sequence and click Next.

    The job sequences display. Note that the selected resource determines the available job sequences.

    Sequence example for dataset

  4. On the Sequence/Template page, select the type of job sequence to run, and click Next.

    The Review page appears.Review page example for sequence
  5. Review the job sequence you just selected, and click Submit Job.

Results

The job is submitted to the Data Catalog processing engine.

Monitoring job status

You can see job status for the jobs you executed on the Jobs tab of the User Profile page:

Jobs page

The Jobs tab lists job submission details. You can sort the job status by clicking any table column header except Asset Name. Note that only template jobs include entries for Template Name.

For each job, one of following job status icons display in the Status column:

IconMeaning
Submitted icon Submitted/Initialized: Not yet started
In-progress icon In-progress: Currently processing
Success icon Success: 100% complete with no errors
Success with warnings icon Success with warnings: Possible errors
Cancelled/Cancelling icon Cancelling/Cancelled: User cancelled jobs
Skipped icon Skipped: Job may not match the resources
Failed icon Failed: Not processed due to error
Incomplete icon Incomplete: includes error

Numbers indicate (total file skips + incompletes / total files)

You can display the latest status of any job by clicking the Refresh all button to reload the Jobs pane without having to refresh the browser.

Monitor job status

To monitor job status, use the following steps:

Procedure

  1. On the main Lumada Data Catalog menu bar, click the User icon and select Profile settings.

    Profile settings option
  2. On the User Profile page, click the Jobs tab.

    Job activity grid The Job activity grid appears.
  3. (Optional) If needed, you can refresh the status display by clicking Refresh all.

  4. (Optional) If needed, you can resubmit a job:

    1. Select the check box next to the job.

      A selected link appears.
    2. Click selected Resubmit job.

    The job is resubmitted.

Terminate a job

You can terminate a submitted job at any time. The job is canceled based on Spark's job scheduling.

Follow the steps below to terminate a job.

Procedure

  1. On the main Lumada Data Catalog menu bar, click the User icon and select Profile settings.

    Profile settings option
  2. On the User Profile page, click the Jobs tab.

  3. Select the check box for the job you want to terminate.

    A selected link displays next to the Refresh all button.
  4. Click the link and select Terminate Instance.

  5. Click Refresh all to update the job status.

Results

The top screen shows that the Status icon for the job instance starts in Submitted status, then indicates Cancelling and then Cancelled.Job termination

View job information

When you click the row of a particular job, a Job Info pane appears on the right side detailing the execution information.

Job Info pane

In the example above, the sequence is Profile Combo, which has three instance steps listed in the Job Info pane in the order they execute: format discovery, schema discovery, and profile.

To view the individual sequence details, you can click the down arrows in the Job Info pane.

Job information

The Job Info pane provides the execution details of the sequence, as described in the table below:

Fields Description
StatusLists the status of the sequence in run time, as follows:
  • INITIAL/SUBMITTED is indicated while the job is waiting. Any new job is in INITIAL status while waiting.
  • IN PROGRESS is indicated while the job is executing.
  • SUCCESS is indicated if the job finishes without issues.
  • FAILED is indicated if the job runs into errors or issues.
CommandLists the command executed, including the optional parameters used, if any.
Total SizeSize of the data asset that was processed.
SuccessThe number of resources within the data asset that were processed successfully. A negative value indicates INITIAL/IN PROGRESS status. This value is only updated after job execution.
SkippedThe number of resources within the data asset that Data Catalog skipped, either because of a corrupt resource or an unsupported format.
IncompleteThe number of resources within the data asset that could not finish discovery due to issues.
StartThe recorded start time.
EndThe recorded end time.

If the Skipped or Incomplete counts are '1' or more, you can click them for details about the skipped or incomplete resources. These lists are also shown in paginated form to improve the response time for large numbers of skipped or incomplete resources.

Incomplete or Skipped resource details

Read job info

Follow the steps below to read information about your job execution.

Procedure

  1. On the main Lumada Data Catalog menu bar, click the User icon and select Profile settings.

    Profile settings option
  2. On the User Profile page, click the Jobs tab.

  3. To select a job, click the job row.

    NoteIf you click the check box on a job row, the Job Info pane does not open.
    Job Info pane The Job Info pane opens on the right side, listing the details for that job instance.
  4. To view more details about the individual instance, click the down arrow to expand the section.