Managing job templates
As an owner of a data node or resource, whether you are an administrators, data stewards, or analyst, you know your data well and are best suited to make delegation decisions regarding your nodes and resources. Lumada Data Catalog delegates job management based on user roles and guidelines set by your organization.
Administrators can grant specified roles the permission to run jobs that process the data resources assigned to those individual roles. An administrator can create custom job templates for the users to use on their data assets. Or, users with job execution permissions, can select from job sequences provided by Data Catalog.
Job templates are pre-defined templates created by the administrators to run specific job sequences that apply to specific clusters. Creating job templates is a method to pass system or Spark-specific parameters as command line arguments for job sequences. Job sequences include algorithms such as driver memory, executor memory, number of threads required based on a cluster size, and overriding the default Data Catalog script parameters like profiling Collections as a single resource. These templates can then be promoted to users handling such clusters for efficient job executions.
Create a new job template
Procedure
Navigate to Manage and click Job Templates.
Click New.
Select your asset type which determines the workflow for creating a job template.
- Virtual Folder
- Dataset
Enter values in the following fields.
Field Description Template Name Enter a name that identifies the job template when shown to those users who can run jobs. Ensure that the name strings do not match any Data Catalog reserved names. This field is required. Template Description (Optional). Enter a brief description of the intended use of this template. Asset Type You can select which data asset your template will run against. Data Catalog provides two types of data assets: Virtual Folder and Dataset. Asset Name Enter the name of the Virtual Folder or Dataset here. Start typing the name in the field and select the best match from the list that displays. Asset Path If you selected Virtual Folder, then this field is auto-filled with the absolute path of the Asset Name entered above. Click Next.
In the Sequence pane, select the job sequence that you defined for the template.
You can choose from the list of Data Catalog jobs sequences that are listed. The jobs are different based on the asset for which you are creating the template: virtual folder or dataset.- If you selected Virtual Folder as the asset, then select one from the list that appears in the dialog box and enter any optional command line parameters, such as [-incremental], [-path ] or [-regex], that may apply to the template you are creating. When finished, click Next.
- If you selected Dataset as the asset, then select one from the list that appears in the dialog box and enter any optional command line parameters, such as [-incremental], that may apply to the template you are creating. When finished, click Next.
- If you selected Virtual Folder as the asset, then select one from the list that appears in the dialog box and enter any optional command line parameters, such as [-incremental], [-path ] or [-regex], that may apply to the template you are creating. When finished, click Next.
Verify your entries and click Create.
Results
Edit a job template
Perform the following steps to edit a job template.
Procedure
Navigate to Manage, and then click Job Templates.
Click the check box of the job template that you want to edit.
Select the job template, then click the More actions icon in the row. Select Edit from the menu that displays.
Optionally, you can edit multiple templates. In the menu bar, click the <count> selected link, and then click Edit.Edit the template.
You can only edit the Template Description, Asset Name and the Data Catalog profiling parameters. You cannot change the Template Name, Asset Type, or Sequences.Verify your edits to the template then click Save.
Delete a job template
Follow the steps below to delete a job template.
Procedure
Navigate to Manage, then click Job Templates.
Click the check box of one or multiple job templates that you want to delete.
Select the job template, then click the More actions icon in the row. Select Delete from the menu that displays.
Optionally, you can delete multiple templates. In the menu bar, click the <count> selected link, and then click Delete.
Results
Submit a job template for execution
Procedure
Navigate to Manage, then click Job Templates.
Click the check box of the job template that you want to submit for execution.
Select the job template, then click the More actions icon in the row. Select Submit from the menu that displays.
Optionally, you can submit multiple templates. In the menu bar, click the <count> selected link, and then click Submit.
Results
View job template activity
Procedure
Navigate to Manage, and then click Job Templates.
Click the check box of the job template for which you want to view job activity.
Select the job template, then click the More actions icon in the row. Select See all instances from the menu that displays.
Optionally, you can view the activity for multiple templates. In the menu bar, click the <count> selected link, and then click See all instances.
Results
Job sequences
Job sequences are sequences of jobs in Lumada Data Catalog that can be executed by users who have job execution privileges.
While a Job Template allows users to set system parameters and adjust finer control over the Data Catalog script flags, the Sequence execution provides access to the individual job commands with control only over the incremental and mode flags.
Trigger a sequence job
Procedure
Navigate from Browse to one of the following locations.
- Resource List
- Single Resource View
- Dataset
In the upper-right corner, click the More actions icon and select Run job now from the menu that displays.
Optionally, you can select a resource, then click the More actions icon to display the actions menu. Click Run job now.The Run Job Now dialog box opens.
From the Run Job Now dialog box, select Sequence.
Based on the resource, follow the sequence in the dialog box workflow for Virtual Folder or Dataset.
NoteWhen you select Fast profiling mode in the Sequence flow, the default values for sample-splits and sample-rows are used as defined in the Agent component's configuration.json file.