Managing rules
With Lumada Data Catalog's rules framework you can define, execute, and manage business rules. These rules can evaluate data and metadata properties to add terms, remove terms, and modify custom properties on data assets.
To manage rules, click Management on the menu bar to open the Manage Your Environment page, and then click Business Rules. If you want to create a business rule quickly, click Add Business Rule.
On the Business Rules page, you can create, run, track, and manage all rules in the Data Catalog. You must enter the rule scope and action using the rule building block and the rule criteria using the Data Catalog rules language. After creating a new rule or updating an existing rule, the rule is available by default.
When you create rules, they are translated automatically into concrete rules that are bound and executed on individual data resources. This translation occurs regardless of which format and platform the resource is in, such as JDBC tables, Hive tables, CSV, Avro, or JSON, as long as it is a format Data Catalog supports.
You can view the following sample applications of the rules framework:
Using a rule for sensitive resource tagging
Many users create business rules to govern sensitive data. The following example identifies and tags all resources containing sensitive data or personal identifiers such as names, addresses, social security numbers, and account information.
Using the Lumada Data Catalog term discovery features, you can identify field metadata and tag data fields such as first name, last name, and address. The Data Catalog built-in terms can identify these fields. Then, you can use a rule to check for any resources that contain tagged sensitive data fields and tag the resources as "PII/Restricted Access
".
Although not shown in exact syntax, the following rule example is the only rule you need to create. The rule is term-based and does not depend on an actual field name, resource name, or resource type.
When you run the rule, it is applied to all qualifying resources and attaches the term you specify when a resource contains the sensitive fields. If you have 100 CSV files, 200 JDBC tables, and 30 Avro files that are all sensitive, they are all labeled correctly after executing this rule.
Rule component | Definition |
Rule Scope | You can define the following rule scope by simply
clicking and selecting parameters using the scope building
block. Select the Virtual folder and the Column Term. From the Column Term list, select the sensitive data or personal identifiers such as names, addresses, social security numbers, and account information. { "virtualFolders": [ "DQM" ], "fieldTerms": [ "Built-in_Terms/US_Address", "Built-in_Terms/First_Name", "Built-in_Terms/Last_Name" ], "resourceTerms": [], "sourcePropertyFilters": {}, "termState": [] } |
Rule Criteria (rule body) | hasFieldTerm(Built-in_Terms/First_Name)
AND hasFieldTerm(Built-in_Terms/Last_Name) AND
hasFieldTerm(Built-in_Terms/US_Address) |
Rule Action | You can tag or remove associations, set or reset properties, and set data
quality by simply selecting it using the action building block.
In this example, we will do the following: Tag SSN
field as " |
Resource tagging based on data properties
You can use Data Catalog rules to create a simple data rule that defines a condition, identifies the resources to which the condition applies, and then performs a specified action. In the following example rule, the rule attaches a resource term to all resources where the data for a given field falls within a specified condition and threshold range.
Rule component | Definition |
Rule Scope |
You can define the following rule scope by simply clicking and selecting parameters using the scope building block. Select virtual folders on which the rule should be executed. When a single virtual folder is selected, rule execution is run against all the resources in the selected virtual folder { "virtualFolders": [ "DQM" ], "fieldTerms": [], "resourceTerms": [], "sourcePropertyFilters": {}, "termState": [] } |
Rule Criteria (rule body) | (Category > 100 AND Category < 199) AND TAX_state =
6A |
Rule Action | You can tag or remove associations, set or reset properties, and set data quality by simply selecting it using the action building block. In this example, we will do the following: Tag resource as "DQM/CA_Employee " |
Rule components
A metadata and data rule executes based on the defined rule criteria and scope on qualified data entities.
You must use the Data Catalog rules components, which provide constructions for expressing scope, conditions, and actions. You can define all these constructions based on actual business terms, field names, custom properties, and business term association states.
Given a rule with Scope S, Criteria B, and Action A, the semantics of the rule can be summarized as "For any resource R that is within S, if B evaluates to true, then perform all actions listed in A on R."
The rule components are:
Rule scope
Sets the scope of resources on which the rule is evaluated and applied.
Rule criteria
Defines the condition.
Rule action
Defines the action to take on resources that conform to the rule's evaluation, such as resource tagging, term removal, setting custom property values, and set quality dimension.
<termDomain>.<Term>
with a forward slash.
For example, enter Built-in_Terms/Last_Name
instead of
Built-in_Terms.Last_Name
. This replacement applies to all sections of the
rule.Rule scope
The rule scope defines the resources for rule execution. You can define the rule scope by specifying scope types in the Set Rule Scope window.
Virtual folders
You must include at least one virtual folder for the rule to compile. When a single virtual folder is listed, rule execution is run against all the resources in the listed virtual folder. You can enter additional virtual folders as a comma-separated list. If a folder no longer exists when the rule is executed, it is ignored.
Source property filters
A comma-separated list of the source property to filter key-value pairs.
Field terms
A comma-separated list of field-level business terms to further filter the virtual folder resources.
Resource terms
A comma-separated list of resource-level business terms to further filter the virtual folder resources.
Term association states
You can further filter the resources based on the term association state. The possible values are ACCEPTED, REJECTED, and SUGGESTED. If you do not specify a term association state, all states are included.
You can view the different scope types in the following rule scope example.
"ruleScope": { "virtualFolders": [ "DQM" ], "sourcePropertyFilters": { "domain": "Finance, Banking" }, "fieldTerms": [ "Built-in_Terms/Last_Name", "Built-in_Terms/US_Address" ], "resourceTerms": [ "CA/Employee" ], "termStates": [ "ACCEPTED", "SUGGESTED" ] }
Rule criteria
The rule criteria (or rule body) defines the rule that is translated and evaluated into a query for execution against every qualifying resource as defined in the rule scope. You can define the rule body by specifying rule types in the Rule Criteria window.
For example, you can insert a clause that determines what the rule body acts on. The query clause determines if the rule acts on metadata or on actual data from the resource.
Metadata query is compiled using resource metadata
For example, the rule body
hasFieldTerm(Built-in_Terms/Social_Security_Number_Delimited) = 1
checks for the presence of the field termBuilt-in_Terms/Social_Security_Number_Delimited
.Metadata query operating on custom property
For example, the rule body
@@business= 'MagnUX'
operates on custom properties looking for specific values.NoteThe inclusion of "@@
" indicates the rule is used for a custom properties and is a metadata rule.Data query is compiled using the resource data
The data query operates on the field terms. For example, the rule body
(@EMS/Category >= 100 and @EMS/Category <= 199) and @EMS/Tax_State = '6A'
inspects the data in the field tagged withEMS/Category
for values between 100 and 199, when the data in the field tagged withEMS/Tax_State
has a value of "6A".NotePrefixing aFieldTerm
with an "@
" indicates the rule operates on the data tagged by theFieldTerm
Depending on the rule type, the following ruleCriteria
queries are possible, where FieldTerm
is a full term name including the domain that it is associated with:
Metadata query | Data query |
Objective: Evaluates against metadata. Rules query for metadata discovered by Data Catalog | Objective: Inspect the data when evaluating rules. Rules query for the specific data value identified by the term. |
Evaluating on field term
| Evaluating data in fields
|
Evaluating on resource term
| Evaluating data in fields with terms
|
Evaluating field name
| Evaluating the length of field values
|
Evaluating custom property
| Evaluating the uniqueness of a field
|
Evaluating nested terms and terms with spaces
| Evaluating the contents of a field
|
Evaluating data in fields with nested terms and terms with spaces
@Domain1/ParentTerm1.childTerm >= 100 and @`Domain name1/field Term2` = “value” |
Rule action
The rule action defines the action to take if the ruleCriteria
evaluates to true
. A rule action is an array of actions and an action can only apply one term. To apply multiple terms, you must submit a ruleAction
for each term.
Actions can be one of the following:
- AddBusinessTerms
- RemoveBusinessTerms
- SetProperties
- ResetProperties
When creating a rule action, include the following parts:
actionType
Set to the action taken by the rule.
actionName
Enter the name of the action.
actionAttributes
In the body, use the following guidelines:
- The inclusion of the
rule_action_field
entry indicates field tagging. The field name specified is tagged with the term provided in therule_action_term_name
. - The exclusion of the
rule_action_field
entry implies resource tagging. The resource is tagged with the term provided in therule_action_term_name
. - The
rule_action_threshold
entry is used only with a data rule. It defines the percentage of rows that satisfy the rule before the rule action is applied. In a metadata rule, this entry is mandatory and will result in an error if not present. In metadata rules, the recommended threshold value is 0 or 1.
- The inclusion of the
You can define the rule action by specifying action types in the Rule Actions window. The rule action includes the following types:
AddBusinessTerm
When
actionType
is set toAddBusinessTerm
, theruleAction
makes term associations based on rule evaluation. A term suggestion can be applied on a specific field or on a qualifying resource. When applying a term suggestion of a field, the field is identified with a full field name.NoteThe Data Catalog rule framework does not create new terms. Any term suggestions to be applied as part of rule action must be for existing terms. If an associated term does not exist, Data Catalog displays an error message.RemoveBusinessTerm
When
actionType
is set toRemoveBusinessTerm
, theruleAction
removes the term associations based on the rule evaluation.SetProperties and ResetProperties
When
actionType
is set toSetProperties
orResetProperties
, theruleAction
sets or resets custom property values.Property values are strings. If you specify property names with
@@
, then the value of its string is substituted for the property name. You can use property actions to set and reset property values. To reset a property value, use ResetProperties.
In the action attributes field, rule_action_property_name
is used to mention the custom property name and rule_action_property_value
is used to set the value for that property, as in the following example:
"actionAttributes": { "rule_action_property_name": "domain", "rule_action_property_value": "Finance", "rule_action_threshold": "1" }
The following code sample provides an example of each action type:
[ { "actionType": "AddBusinessTerms", "actionName": "", "actionAttributes": { "rule_action_term_name": "PII/Sensitive", "rule_action_threshold": "40", "rule_action_field": "SSN" } }, { "actionType": "AddBusinessTerms", "actionName": "", "actionAttributes": { "rule_action_term_name": "PII/Sensitive", "rule_action_threshold": "40" } }, { "actionType": "AddBusinessTerms", "actionName": "", "actionAttributes": { "rule_action_term_name": "PII/Sensitive", "rule_action_threshold": "40", "rule_action_stats_field": "SSN" } }, { "actionType": "RemoveBusinessTerms", "actionName": "Remove Field Term", "actionAttributes": { "rule_action_term_name": "PII/Sensitive", "rule_action_threshold": "40", "rule_action_field": "SSN" } }, { "actionType": "RemoveBusinessTerms", "actionName": "Remove Resource Term", "actionAttributes": { "rule_action_term_name": "PII/Sensitive", "rule_action_threshold": "40" } }, { "actionType": "setProperties", "actionName": "Update Custom Property value based on threshold", "actionAttributes": { "rule_action_property_name": "domain", "rule_action_property_value": "Finance", "rule_action_threshold": "1" } }, { "actionType": "setProperties", "actionName": "Update Custom Property value", "actionAttributes": { "rule_action_property_name": "domain", "rule_action_property_value": "Finance" } }, { "actionType": "ResetProperties", "actionName": "reset proprerty value", "actionAttributes": { "rule_action_property_name": "domain", "rule_action_property_value": "" } } ]
Requirements for writing rules
You can avoid errors by adopting the following requirements when writing rules:
- When using terms in the
ruleCriteria
for a data query, you must prefix the terms with the@
symbol and then theglossaryname/termname
qualifier. In the absence of the@
qualifier, a termglossaryname/termname
is interpreted as a column name which may or may not exist and the corresponding results may be misreported. - When evaluating rules to set custom properties, you must prefix the custom property with the
@@
qualifier. - Data Catalog supports minimal SQL functions in the rule definition such as
AND
,OR
,<
,>
,IN
, andlength()
. - All terms specified in the
actionAttribute
field need to pre-exist. - The rule syntax requires that you replace the dot in
<termDomain>.<Term>
with a forward slash. For example, enterBuilt-in_Terms/Last_Name
instead ofBuilt-in_Terms.Last_Name
. This replacement applies to all sections of the rule. - For terms or a glossary with spaces, enclose the term between single quotation marks. For example:
@`Glossary name/Term name` > 200
- To use field names with spaces, enclose the term between single quotation marks. For example:
hasFieldName(`First Name`)=1
Rule workflow
On the Business Rules page, you can create, update, edit, and delete rules.
Create a rule
Procedure
Click Management on the menu bar to open the Manage Your Environment page, and then select Business Rules to open the Business Rules page. Click Add Business Rule.
The Create Business Rule page opens.Enter the rule name and description.
Set the Rule Scope. The rule scope includes the following parameters:
You can create your own scope and save it for future use by yourself or another user or select one of the existing ones using the Load Scope option.
Virtual Folders
Select virtual folders on which the rule should be executed. You must include at least one virtual folder for the rule to compile. When a single virtual folder is listed, rule execution is run against all the resources in the listed virtual folder. You can select additional virtual folders. If a folder no longer exists when the rule is executed, it is ignored.
Resource Terms
Select business terms to filter the virtual folder resources.
Column terms
Select the business terms associated with the fields or columns to filter the virtual folder resources.
Custom Properties
Select the custom properties to filter the virtual folder resources.
Term State
You can further filter the virtual folder resources based on the term association state. The possible values are ACCEPTED, REJECTED, and SUGGESTED. If you do not specify a term association state, all states are included.
Set the Rule Criteria.
You can create your rule criteriaand save it for future use by yourself or another user or select one of the existing ones using the Load Scope option.
Define the rule's criteria for evaluation using rule syntax, with MetadataRule or DataRule. See the following examples:
MetadataRule
For dealing with metadata stored in the database. Syntax is:
hasResourceTerm(TermFullyQualifiedName) =1
Work on the resources that have the given resource term associated with them
hasFieldTerm(fieldName) =1
Work on the fields/columns that have the given field term associated with them
DataRules
For dealing with actual data present in the resource. Syntax is to specify the column name directly or use
@TermFullyQualifiedName
equivalent to the field with the given term name. Use@@customPropertyName
to specify the custom property with specific values.
Set the Rule Actions. The rule actions include the following action types.
You can create rule actions and save them for future use by yourself or another user or select one of the existing ones using the Load Scope option.Action Type Description Add Business Terms Business Term
Select the term name that you want to add.
Action Field
Specify the field name if the associated action should be performed on a specific field. The field name specified is tagged with the term provided in the Add Business Terms.
If the field name is not specified, then the action will be performed at the resource level.
Set Threshold
Specify the threshold value at which to perform the rule action.
Remove Business Terms Business Term
Select the term name that you want to remove.
Action Field
Specify the field name if the associated action should be performed at the column level. The field name specified is tagged with the term provided in the Remove Business Terms.
If the field name is not specified, then the action will be performed at the resource level.
Set Threshold
Specify the threshold value at which to perform the rule action.
Set Properties Select Property
Select the property for which you want to set the custom properties.
Action Field
Specify the field name if the associated action should be performed on a field.
Set Threshold
Specify the threshold value at which to perform the rule action.
Reset Properties Select Property
Select the property for which you want to reset the custom properties.
Action Field
Specify the field name if the associated action should be performed on a field.
Set Threshold
Specify the threshold value at which to perform the rule action.
Set Quality Dimension Set Quality Dimension
Select the data quality dimension that defines your rule. This dimension is reflected in the data quality graph on the Data Canvas page. Options include:
None
This rule is not used as a data quality metric.
Accuracy
The degree to which data correctly describes the "real world" object or event being described.
Completeness
The proportion of stored data against the business definition of “100% complete”.
Consistency
The absence of difference when comparing two or more representations of an item against a definition. Each data item is measured against itself or its counterpart in another data set.
Note that consistency assessment may not be applicable to all data items.
Uniqueness
The inverse of an assessment of the level of duplication.
Validity
Data is valid if it conforms to the syntax (format, type, range) of its business definition. Typically, this value is the overall measure of data quality.
Timeliness
The degree to which data represent reality from the required point in time. This is measured by the time distance between each correct and incorrect data point.
Action Field
Specify the field name if the associated action should be performed at the column level. The field name specified is tagged with the term provided in the Set Quality Dimension.
If the field name is not specified, then the action will be performed at the resource level
Set Low Threshold
Specify the threshold value at which to perform the rule action. Default is 50.
Set High Threshold
Specify the threshold value at which to perform the rule action. Default is 90.
You can view the following data quality values in a donut chart for the column.
<low = red
>low<high = orange
>high = green
Click Create Rule.
Next steps
Update a rule
Perform the following steps to edit a rule:
Procedure
Click Management on the menu bar to open the Manage Your Environment page, and then select Business Rules to open the Business Rules page.
The Business Rules page opens.Locate the business rule you want to configure in the table of rules and select the View Details button (greater-than sign) in its row.
If you have a large number of rules, select Show Filters to help you find the rule you want to edit.The Business Rule page opens for the selected rule.Edit the fields as needed and click Save Rules.
The rule is saved with your changes. If there is a problem while creating your rule, an error notification displays at the top of the page. Resolve the error and click Save Rules.
Delete a rule
Procedure
Click Management on the menu bar to open the Manage Your Environment page, and then select Business Rules.
The Business Rules page opens.Use the check box to select the rule you want to delete.
Click the Actions menu and then click Remove. Optionally, select the More actions icon, then click Remove from the drop-down menu.
A message appears confirming your business rule is now deleted.Click Close on the message box to return to the Business Rules page.
View rules
You can view the list of all the rules in Data Catalog.
Perform the following steps to view the rules:
Procedure
Go to Management and click Business Rules.
The list of all rules in Data Catalog is shown.To view the rules associated with the specific resource. Navigate to Explore Your Data > Virtual folders > Specific file.
Click Rules tab.
Rules specific to that file are shown.
Adding rule blocks
You can create a scope, criteria, and actions block and use it anytime later while creating a rule.
Perform the following steps to create a block:
Procedure
Go to Management and click Business Rules.
Click the Blocks tab.
Click Add New Block, and select the scope, criteria, or actions for which you want to create a block.
NoteIf you make any changes to the block, the rules that are using these blocks are impacted.
Execute rules
Perform the following steps to execute a Data Catalog rule.
- Go to Management and click Business Rules.
- Select the rule you want to run and click the
Execution tab.
The Execution Schedule window opens. You can run the rule immediately or add a schedule.
- Select Run Now to run the rule immediately.
- Click Add Schedule to schedule the rule and select one of the
following schedules:
- On a date
- Daily
- Weekly
- Monthly
NoteSet the schedule time in UTC zone only. - (Optional) If you want to enter parameters, select the
Advanced Mode check box, and enter the parameters in the text
box.
For example, to generate a rule execution report, enter the following additional parameters before rule execution:
-generateReport true -reportName <Name of the report being generated>
For more information on additional parameters, see Rule execution report.
- Click Apply Changes.
Propagate bindings associates the rule with all the data entities that fall under the selected resources.
On the Execution Schedule window,click Propagate Bindings to associate the rule with the data entities.
You will receive notification about the propagate binding status. After completing it, you can go to each data entity and run the rule as required. With propagate bindings, owners of data entities have an option to run the rule or not.
Rule execution report
A rule execution report is a report of all the rules that summarizes how well a rule evaluates the resources in Data Catalog.
To generate a rule execution report, enter the following additional parameters before rule execution:
-generateReport true -reportName <Name of the report being generated>
Define the options as follows:
-generateReport
If this parameter is passed, rule execution generates a report with the name specified by the
-reportName
parameter.-reportName
User-defined name for the report being generated. Use with the
-generateReport
parameter.
All reports are generated in the /var/log/ldc/generatedReports directory. If you do not provide a report name, Data Catalog randomly generates a unique name for each report.