Managing job templates
As an owner of a data node or resource, whether you are administrators, data stewards, or analyst, you know your data well and are best suited to make delegation decisions regarding your nodes and resources. Lumada Data Catalog delegates job management based on user roles and guidelines set by your organization.
Administrators can grant specified roles the permission to run jobs that process the data resources assigned to those individual roles. An administrator can create custom job templates for the users to use on their data assets. Or, users with job execution permissions can select from job sequences provided by Data Catalog.
Job templates are pre-defined templates created by the administrators to run specific job sequences that apply to specific clusters. Creating job templates is a method to pass system or Spark-specific parameters as command line arguments for job sequences. Job sequences include algorithms such as driver memory, executor memory, number of threads required based on a cluster size, and overriding the default Data Catalog script parameters like profiling Collections as a single resource. These templates can then be promoted to users handling such clusters for efficient job executions.
Create a new job template
Procedure
On the Home page from the left-side menu bar, click Management, or click Go to Management in the Management card.
The Manage Your Environment page opens.Click Job Templates on the Job Management card.
The Job Templates page opens.Click Add Template.
Virtual Folder is preselected.Enter values in the following fields.
Field Description Name Enter a name that identifies the job template when shown to those users who can run jobs. Ensure that the name strings do not match any Data Catalog reserved names. This field is required. Description (Optional) Enter a brief description of the intended use of this template. Asset Type Use a Virtual Folder for your template to run against. Virtual Folder Name Enter the name of the Virtual Folder. Start typing the name in the field and select the best match from the list that appears. Asset Path This field is auto-filled with the absolute path of the value entered in the Virtual Folder Name field. Process Type Select the sequence processing that you want to use. Click Incremental Profiling if you want to use incremental processing for the job template.
In the Enter Parameters field, enter any command line parameters for the job template.
Click Create Template.
Results
Edit a job template
Perform the following steps to edit a job template.
Procedure
On the Home page from the left-side menu bar, click Management, or click Go to Management in the Management card.
The Manage Your Environment page opens.Click Job Templates on the Job Management card.
The Job Templates page opens.Locate the Template Name that you want to edit then click the View Details icon in the row.
The Job Template Details page opens.Edit the template.
You can only edit the Description, Virtual Folder Name, Asset Path, Process Type, and the Data Catalog profiling and command line parameters. You cannot change the Template Name or Asset Type.Verify your edits to the template then click Save Template.
Delete a job template
Follow the steps below to delete a job template.
Procedure
On the Home page from the left-side menu bar, click Management, or click Go to Management in the Management card.
The Manage Your Environment page opens.Click Job Templates on the Job Management card.
The Job Templates page opens.Click the check box of one or multiple job templates that you want to delete.
The Delete Template confirmation dialog box opens.Enter yes in the Please Confirm field to proceed.
Click Confirm.
Results
Submit a job template for execution
Procedure
On the Home page from the left-side menu bar, click Management, or click Go to Management in the Management card.
The Manage Your Environment page opens.Click Job Templates on the Job Management card.
The Job Templates page opens.Locate the Template Name that you want to submit for execution and click the More actions icon in the row.
Select Start Now from the menu that displays.
Results
View job template activity
Procedure
On the Home page from the left-side menu bar, click Management, or click Go to Management in the Management card.
The Manage Your Environment page opens.Click Job Templates on the Job Management card.
The Job Templates page opens.Locate the Template Name that you want to submit for execution and click the More actions icon in the row.
Select View Instances from the menu that displays.
Results
Job sequences
Job sequences are sequences of jobs in Lumada Data Catalog that can be executed by users who have job execution privileges.
Trigger a sequence job
Procedure
On the Home page from the left-side menu bar, click Data Canvas.
Use the Navigation pane to drill down to the resource.
Click More actions and then select Process from the menu that displays.
The Process Selected Items page opens.Click the sequence that you want to use.
The sequence page opens.Sequence Description Select Template A template is a custom definition for a given process with a custom set of parameters. Format Discovery Identifies the format of data resources, marking the resources that can be further processed. Schema Discovery Applies format-specific algorithms to determine the structure of the data in each resource, producing a list of columns or fields for each resource’s catalog entry. Collection Discovery Discover collections of data elements with same schema. Data Profiling Profiling applies data-specific logic to compute field-level statistics and patterns for each resource as unique fingerprints of the data. Data Profiling Combo Starts a combined sequence of processes to profile your data. Executes format discovery, schema discovery and data profile process. Business Term Discovery Compares and analyzes the computed fingerprints with any defined or seeded label signatures to discover possible matches. Note that users must have Run Term Discovery permissions to run this job.
Lineage Discovery Shows relationships among resources in the form of a lineage graph. Data lineage identifies copies of the same data, merges between resources, and the horizontal and vertical subsets of these resources. Data Rationalization Finds redundant data copies and overlaps. Based on the resource, follow the workflow for the sequence.
Click Incremental Profiling if you want to use incremental processing.
NoteWhen you select Fast profiling mode in the Sequence flow, the default values for sample-splits and sample-rows are used as defined in the Agent component's configuration.In the Enter Parameters field, enter any command line parameters for the sequence.
Click Start Now.
Results