Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Managing rules

Parent article

With Lumada Data Catalog's rules framework you can define, execute, and manage business rules. These rules can evaluate data and metadata properties to add terms, remove terms, and modify custom properties on data assets.

To manage rules, click Management on the menu bar to open the Manage Your Environment page, and then click Business Rules. Optionally, if you want to create a business rule quickly, click Add Business Rule.

On the Business Rules page, you can author, run, track, and manage all rules in the catalog. All rules must be entered using the Data Catalog rules language. After creating a new rule or updating an existing rule, the rule is enabled by default.

When you write rules using terms, they are translated automatically into concrete rules that are bound and executed on individual data resources. This translation occurs regardless of which format and platform the resource is in, such as JDBC tables, Hive tables, CSV, Avro, or JSON, as long as it is a format Data Catalog supports. A rule written using terms captures business logic explicitly and can express many concrete rules.

You can view the following sample applications of the rules framework:

Using a rule for sensitive resource tagging

This example identifies and tags all resources containing sensitive data or personal identifiers such as names, addresses, social security numbers, and account information. Using the Lumada Data Catalog term discovery features, you can identify field metadata and tag data fields such as first name, last name, and address. The Data Catalog built-in terms can identify these fields. Then, you can use a rule to check for any resources that contain tagged sensitive data fields and tag the resources as "Restricted Access".

Although not shown in exact syntax, the rule illustrated below is the only rule you need to write. The rule is term-based and does not depend on an actual field name, resource name, or resource type.

When the rule is processed, it is automatically bound to all qualifying resources and attaches the term you specify when a resource contains the sensitive fields. If you have 100 CSV files, 200 JDBC tables, and 30 Avro files that are all sensitive, they all are labeled correctly after executing this rule.

Metadata rule with field term for sensitive data
Syntax partDefinition
Rule Scope
{
	"virtualFolders": [
		"DQM"
	],
	"fieldTerms": [
		"Built-in_Terms/US_Address",
		"Built-in_Terms/First_Name",
		"Built-in_Terms/Last_Name"
	],
	"resourceTerms": [],
	"sourcePropertyFilters": {},
	"termState": []
}
Rule Criteria (rule body)hasFieldTerm(Built-in_Terms/First_Name) AND hasFieldTerm(Built-in_Terms/Last_Name) AND hasFieldTerm(Built-in_Terms/US_Address)
Rule ActionTag SSN field as "Restricted Access"

Resource tagging based on data properties

This example attaches a resource term to all resources where the data for a given field is within a certain range. With the Data Catalog rules, a simple data rule defining the condition can identify the resources to which the condition applies and take the corresponding action.

Data rule with resource term
Syntax partDefinition
Rule Scope
{
    "virtualFolders": [
        "DQM"
    ],
    "fieldTerms": [],
    "resourceTerms": [],
    "sourcePropertyFilters": {},
    "termState": []
}
Rule Criteria (rule body)(Category > 100 AND Category < 199) AND TAX_state = 6A
Rule ActionTag resource as "Glossary Name/Term Name"

Rule syntax

A metadata rule executes on all the resources in Data Catalog, applying and evaluating the rule against one resource at a time and executing the specified rule action. Execution of data rules is limited to the resources managed by a specific agent. All rules must be entered using the Data Catalog rules language.

The language of the Data Catalog rules provides constructs for expressing scope, conditions, and actions. A unique capability in Data Catalog is that you can express all these constructs based on actual business terms, field names, custom properties, and business term association states.

Given a rule with Scope S, Body B, and Action A, the semantics of the rule can be summarized as: "For any resource R that is within S, if B evaluates to true, then perform all actions listed in A on R."

The rule syntax contains three parts:

  • Rule scope

    Sets the scope of resources on which the rule is evaluated and applied.

  • Rule body

    Defines the condition in a SQL predicate.

  • Rule action

    Defines the action to be taken on resources that conform to the rule evaluation, such as resource tagging, term removal, and setting custom property values.

NoteIn rule syntax, a termDomain.Term needs the dot replaced with a forward slash. For example, enter Built-in_Terms/Last_Name instead of Built-in_Terms.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.

Rule scope

The rule scope defines the resources for rule execution. You can define the rule scope by specifying scope types in the Set Rule Scope window.

  • Virtual folders

    At least one virtual folder is required for the rule to compile. When a single virtual folder is listed, rule execution is run against all the resources under the listed virtual folder. You can enter additional virtual folders as a comma-separated list. If a folder no longer exists when the rule is executed, it is ignored.

  • Source property filters

    Comma-separated list of the source property filters key-value pairs.

  • Field terms

    Comma-separated list of field-level business terms to further filter the virtual folder resources.

  • Resource terms

    Comma-separated list of resource-level business terms to further filter the virtual folder resources.

  • Term association states

    List of term association states that the rule evaluates, such as ACCEPTED, REJECTED, and/or SUGGESTED to further filter the resources. If you do not specify a business term association state, all states are included.

NoteIn rule syntax, a termDomain.Term needs the dot replaced with a forward slash. For example, enter Built-in_Terms/Last_Name instead of Built-in_Terms.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.

For example:

"ruleScope": {
        "virtualFolders": [
            "DQM"
        ],
        "sourcePropertyFilters": { "domain":
              "Finance", "Banking" },
        "fieldTerms": [
            "Built-in_Terms/Last_Name",
            "Built-in_Terms/US_Address"
        ],
        "resourceTerms": [
            "CA/Employee"
        ],
        "termStates": [
            "ACCEPTED",
            "SUGGESTED"
        ]
    }

Rule criteria

The rule criteria (or rule body) defines the rule that is translated and evaluated into a query to be executed against every qualifying resource as defined in the rule scope. You can define the rule body by specifying rule types in the Rule Criteria window.

For example, you can insert a clause that determines what the rule body acts on. The query clause determines if the rule acts on metadata or on actual data from the resource.

NoteIn rule syntax, a termDomain.Term needs the dot replaced with a forward slash. For example, enter Built-in_Terms/Last_Name instead of Built-in_Terms.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.
  • Metadata query is compiled using resource metadata

    For example, the rule body hasFieldTerm(Built-in_Terms/Social_Security_Number_Delimited) = 1 checks for the presence of the field term Built-in_Terms/Social_Security_Number_Delimited.

  • Metadata query operating on custom property

    For example, the rule body @@business= 'MagnUX' operates on custom properties looking for specific values.

  • Data query is compiled using the resource data

    • Data query operating on field terms

      For example, the rule body (@EMS/Category >= 100 and @EMS/Category <= 199) and @EMS/Tax_State = '6A' inspects the data in the field tagged with EMS/Category for values between 100 and 199, when the data in the field tagged with EMS/Tax_State has a value of "6A".

    • Data query operating on custom property

      For example, the rule body @@business = 'MagnUX' operates on custom properties looking for specific values.

Note the following conventions for data queries:

  • Prefixing a FieldTerm with an "@" indicates the rule operates on the data tagged by the FieldTerm.
  • The presence of "@@" indicates the rule is used for a custom properties and is a metadata rule.

Depending on the rule type, the following ruleBody queries are possible, where FieldTerm is a full term name including the domain it is associated with:

Metadata queryData query
Evaluates against metadata. Rules query for metadata discovered by Lumada Data CatalogInspect the data when evaluating rules. Rules query for the specific data value identified by the term.
Evaluating on field term ruleBody: hasFieldTerm(A/x) AND hasFieldTerm(B/y)FieldName1 IN (val1,val2) AND FieldName2 = ‘Some Value’
Evaluating on field term ruleBody: hasResourceTerm(M/j) OR hasFieldTerm(A/x)(@Domain1/Term1 + @Domain2/Term1) < @Domain3/Term1
Evaluating on resource term ruleBody: hasResourceTerm(M/j) AND hasResourceTerm(N/f)@FieldTerm1 = ‘someValue’
(@fieldTerm1 >= 100 and @fieldTerm1 <= 199) and @fieldTerm2 = 'some_value'
@Built-in_Terms/US_City = 'Los Angeles' OR @Built-in_Terms/US_City IN ('Fresno', 'Los Angeles', 'San Francisco') OR @Built-in_Terms/US_City = 'Folsom' OR 'city.*' = 'Los Angeles' OR length('city.\*') > 5 OR length(@Built-in_Terms/US_City) > 6
@FieldTerm1 > 'someValue'CASE statement support @Built-in_Terms/US_Zip_Code in ('10003', '10019', '10036', '10014') and (case when hasFieldTerm(@Built-in_Terms/US_City) =1 then @Built-in_Terms/US_City is null else true end)
Evaluating on custom property @@business = 'MagnUX' @@strike-count = '3'

Rule action

The rule action defines the action to be taken if the ruleBody evaluates to true (1).

A rule action is an array of actions and an action can apply only one term. To apply multiple terms, you must submit a ruleAction for each term.

Actions can be one of the following:

  • AddBusinessTerms
  • RemoveBusinessTerms
  • SetProperties
  • ResetProperties

The actionType should be set to the action taken by the rule.

The actionName should be the name of the action.

In the actionAttributes body:

  • The presence of the rule_action_field entry indicates field tagging. The field name specified is tagged with the term provided in the rule_action_term_name.
  • The absence of the rule_action_field entry implies resource tagging. The resource is tagged with the term provided in the rule_action_term_name.
  • The rule_action_threshold entry is used only with a data rule and defines the percentage of rows that should satisfy the rule before the rule action is applied.

You can define the rule action by specifying action types in the Rule Actions window.

The following are the rule action types:

  • AddBusinessTerm

    When actionType is set to AddBusinessTerm, the ruleAction makes term associations based on rule evaluation. A term suggestion can be applied on a specific field or on a qualifying resource. When applying a term suggestion of a field, the field is identified in one of the following ways:

    • Full field name.
    • Wildcard field name with partial string match for field name. Wildcard strings should follow the JAVA regular expression pattern format.
    • Referencing another field term associated with the field.
    NoteThe Data Catalog rule framework does not create new terms. Any term suggestions to be applied as part of rule action must be for existing terms. If an associated term does not exist, Data Catalog displays an error message.
  • RemoveBusinessTerm

    When actionType is set to RemoveBusinessTerm, the ruleAction removes the term associations based on the rule evaluation.

  • SetProperties and ResetProperties

    When actionType is set to SetProperties or ResetProperties, the ruleAction sets or resets custom property values.

    Property values are strings. You specify property names with @@ and the value of its string is substituted.You can use property actions to set and reset property values. To reset a property value, use ResetProperties.

The following code sample gives an example of each action type:

NoteIn rule syntax, a termDomain.Term needs the dot replaced with a forward slash. For example, enter Built-in_Terms/Last_Name instead of Built-in_Terms.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.
[
	{
		"actionType": "AddBusinessTerms",
		"actionName": "",
		"actionAttributes": {
			"rule_action_term_name": "PII/Sensitive",
			"rule_action_threshold": "40",
			"rule_action_field": "SSN"
		}
	},
	{
		"actionType": "AddBusinessTerms",
		"actionName": "",
		"actionAttributes": {
			"rule_action_term_name": "PII/Sensitive",
			"rule_action_threshold": "40"
		}
	},
	{
		"actionType": "AddBusinessTerms",
		"actionName": "",
		"actionAttributes": {
			"rule_action_term_name": "PII/Sensitive",
			"rule_action_threshold": "40",
			"rule_action_stats_field": "SSN"
		}
	},
	{
		"actionType": "RemoveBusinessTerms",
		"actionName": "Remove Field Term",
		"actionAttributes": {
			"rule_action_term_name": "PII/Sensitive",
			"rule_action_threshold": "40",
			"rule_action_field": "SSN"
		}
	},
	{
		"actionType": "RemoveBusinessTerms",
		"actionName": "Remove Resource Term",
		"actionAttributes": {
			"rule_action_term_name": "PII/Sensitive",
			"rule_action_threshold": "40"
		}
	},
	{
		"actionType": "setProperties",
		"actionName": "Update Custom Property value based on threshold",
		"actionAttributes": {
			"rule_action_property_name": "domain",
			"rule_action_property_value": "Finance",
			"rule_action_threshold": "1"
		}
	},
	{
		"actionType": "setProperties",
		"actionName": "Update Custom Property value",
		"actionAttributes": {
			"rule_action_property_name": "domain",
			"rule_action_property_value": "Finance"
		}
	},
	{
		"actionType": "ResetProperties",
		"actionName": "reset proprerty value",
		"actionAttributes": {
			"rule_action_property_name": "domain",
			"rule_action_property_value": ""
		}
	}
]

You can also define or update the rule action using the Add Template button. When using Add Template, the template is inserted on a new line. When entering your definition details, you can select from the system suggestions to help you complete the field entries.

The following selections are available using Add Template:

  • Tagging template

    Adds a tagging template.

  • Remove Tagging template

    Removes a tagging template.

  • Property template

    Adds a property template.

  • Reset text

    Removes any changes and returns the rule action default settings.

Sample rules

You can use these examples of metadata and data rules to help you write rules for your implementation of Data Catalog:

Metadata rule samples

Metadata rules work with the metadata associated with the resource or field.

The metadata rule examples below show the following situations:

  • Field term binding using a field name
  • Field term binding using a wildcard or partial match of a field name
Field term binding using a field name

The following sample rule validates the presence of Built-in_Terms.Social_Security_Number_Delimited for all resources containing the field terms Built-in_Terms.Last_Name and Built-in_Terms.Address either in ACCEPTED or SUGGESTED state within the HDFS virtual folder, then applies the resource term PII.Sensitive to the field named "SSN".

NoteIn rule syntax, a termDomain.Term needs the dot replaced with a forward slash. For example, enter Built-in_Terms/Last_Name instead of Built-in_Terms.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.

"name" : "Sensitive Term Completeness",
"description" : "If field terms Built-in_Terms/US_Address, Built-in_Terms/Last_Name and Built-in_Terms/Social_Security_Number_Numeric are present then add PII/Sensitive to SSN field"
"ruleScope" : 	{
			"virtualFolders": [
				"HDFS"
			],
			"fieldTerms": [
				"Built-in_Terms/US_Address",
				"Built-in_Terms/Last_Name"
			],
			"resourceTerms": [
				"CA/Employee"
			],
			"sourcePropertyFilters": {
				"domain": "Finance, Banking"
			},
			"termState": [
				"ACCEPTED",
				"SUGGESTED"
			]
		}

"ruleBody" : "hasFieldTerm(Built-in_Terms/Social_Security_Number_Numeric)=1"
 
"ruleActions" : [
			{
				"actionType": "AddBusinessTerms",
				"actionName": "",
				"actionAttributes": {
					"rule_action_term_name": "PII/Sensitive",
					"rule_action_threshold": "40",
					"rule_action_field": "SSN"
				}
			}
		]

  • ruleBody

    This field validates the presence of field term Built-in_Terms.Social_Security_Number_Delimited in the resources.

  • ruleScope

    This field applies various filters or scopes the rule to specific resources:

    • virtualFolders

      This field limits the rule evaluation to only the virtual folder named DQM.

    • fieldTags

      This attribute further filters resources that contain the field terms Built-in_Terms.Last_Name and Built-in_Terms.Address.

    • resourceTerms

      This field further limits evaluation to resources containing the CA.Employee resource term.

    • tagStates

      This field restricts the rule evaluation rule to consider terms in either the ACCEPTED or SUGGESTED states.

  • ruleAction

    This field specifies the attributes to apply the action, in this case to attach PII.Sensitive to the field SSN in all resources the which the rule applies.

  • rule_action_threshold

    This field specifies the minimum per cent evaluation match for the action to be taken. In the above example, all the resources where 40 percent or more of the data passes the rule evaluation are tagged with the PII.Sensitive field term.

Field term binding using a wildcard or partial match of a field name

The following sample rule validates the presence of the field term Built-in_Terms.Email in all resources within the DQM, Holdings, and Bank_Retail virtual folders, then applies the field term PII.contactType to the fields beginning with email*.

{
    "name": "EmailTagger",
    "ruleBody": "hasFieldTerm(Built-in_Terms/Email)=1",
    "ruleScope": {
        "virtualFolders": ["DQM", "Holdings", "Bank_Retail"],
        "sourcePropertyFilters": {},
        "fieldTerms": [],
        "resourceTerms": [],
        "termStates": ["ACCEPTED", "SUGGESTED"]
    },
    "ruleActions": [
        {
            "actionType": "Tagging",
            "actionName": "term",
            "actionDisplayName": "term",
            "actionAttributes": {
                "rule_action_field": "email*",
                "rule_action_term_name": "PII/contactType",
                "rule_action_threshold": "0"
            }
        }
    ]
}

  • ruleBody

    This field validates the presence of field term Built-in_Terms.Email in the resources.

  • ruleScope

    This field applies various filters or scopes the rule to specific resources:

    • virtualFolders

      This field limits the rule evaluation to only the virtual folders named DQM, Holdings and Bank_Retail.

    • tagStates

      This field restricts the rule evaluation to terms in either the ACCEPTED or SUGGESTED states.

  • ruleAction

    This field specifies the attributes to apply the action, in this case to attach PII.contactType as a field term to all fields containing email in all qualifying resources.

  • rule_action_threshold

    This field specifies the minimum percentage evaluation match for the action to be taken. In the above example, all the resources where 0% or more of the data passes the rule evaluation are tagged with the PII.contactType field term.

Data rule samples

A data rule inspects the data for a field or field term when evaluating a resource.

The data rule examples below show the following situations:

  • Resource term and field term binding
  • Custom property setting and field term removal
Resource term and field term binding

This rule is for resource term binding on resources that contain a certain type of data. It checks for all resources with the resource term DQM.Employee and examines the data in fields tagged with DQM.taxCode and DQM.stateCode for qualifying data, then attaches the resource term DQM.CA_Employee and DQM.CA_Tax term to the field tagged with DQM.stateCode.

NoteIn rule syntax, a termDomain.Term needs the dot replaced with a forward slash. For example, enter Built-in_Terms/Last_Name instead of Built-in_Terms.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.
"name" : "CA_Term Rule",
"ruleScope" : 	{
			"virtualFolders": [
				"DQM"
			],
			"fieldTerms": [
				"DQM.taxCode",
				"DQM.stateCode"
			],
			"resourceTerms": [
				"DQM/Employee"			
			],
			"sourcePropertyFilters": {},
			"termState": [
				"ACCEPTED",
				"SUGGESTED"
			]
		}

"ruleBody" : "(@DQM.taxCode >= 100 and @DQM.taxCode <= 199) and @DQM.stateCode = '6A'"
 
"ruleActions" : [
			{
				"actionType": "AddBusinessTerms",
				"actionName": "CA ResourceTerm Tagging",
				"actionAttributes": {
					"rule_action_term_name": "DQM/CA_Employee",
					"rule_action_threshold": "50"
				}
			},
			{
				"actionType": "AddBusinessTerms",
				"actionName": "CA Field Term Tagging",
				"actionAttributes": {
					"rule_action_term_name": "DQM/CA_Tax",
					"rule_action_threshold": "50",
					"rule_action_field": "taxcode"
				}
			}
		]

The rule elements are used as follows:

  • ruleScope

    The ruleScope element scopes the rule to the following filters:

    • virtualFolders

      This field limits the rule evaluation to only the virtual folders named DQM.

    • resourceTerm

      This field filters the resources that have the DQM.Employee resource term.

    • termStates

      This field restricts the rule to terms in either the ACCEPTED or SUGGESTED states.

  • ruleBody

    The ruleBody element limits the data to resources with fields tagged with DQM.taxCode that have values between 100 and 199, and the data in the field tagged with DQM.stateCode contains the value "6A".

  • ruleActions

    The ruleActions element is an array of two actions:

    • Resource tagging action

      • rule_action_term_name

        This field specifies the new term DQM.CA_Employee to bind to qualifying resources.

      • rule_action_field

        This field is left out intentionally to indicate resource tagging.

      • rule_action_threshold

        This field specifies the minimum percentage evaluation match for the action to apply. In the above example, all the resources where 50% or more of the data passes the rule evaluation are tagged with the DQM.CA_Employee resource term.

    • Field tagging action

      • rule_action_term_name

        The rule_action_term_name from one action specifies the new term DQM.CA_Employee to bind to qualifying resources, while the other action specifies the DQM.CA_Tax term that binds to a field specified in the rule_action_field.

      • rule_action_field

        The presence of rule_action_field indicates field tagging, and the reference term @DQM.stateCode is used for binding the DQM.CA_Tax term.

        NoteThe rule_action_field is used in two ways:
        • When you prefix the value in rule_action_field with @, the value is used as a term.
        • Without the @, this field is interpreted as the field name that is used for binding (full or wildcard).
      • rule_action_threshold

        The rule_action_threshold field specifies the minimum percentage evaluation match for the action to apply. For example, all the resources where 50% or more of the data passes the rule evaluation are tagged with the DQM.CA_Employee resource term.

Custom property setting and field term removal

This example is for a rule updating a custom property value and removing a field term.

This rule checks for all resources with the resource term DQM.Employee and field term DQM.CA_Tax. It checks the data in fields tagged with DQM.stateCode for the value is "6A". It also validates the value of the custom property data_owner, checking to see if the value is '"Lara". If both these conditions are satisfied, the rule updates the value of data_owner to "Joe" and removes the term DQM.CA_Tax bound to fields tagged with DQM.stateCode.

{
  "name": "Remove CA_Tax field term and update data_owner custom property",
  "ruleBody": "((@DQM.stateCode = '6A') AND (@@data_owner = 'Lara')",
  "metadataRule": [],
  "ruleScope":{
    "virtualFolders":["Banking", "DQM", "LDC-Warehouse"],
    "sourcePropertyFilters": {},
    "fieldTerms": ["DQM/CA_Tax"],
    "resourceTerms": ["DQM/Employee"],
    "termStates": ["ACCEPTED", "SUGGESTED"]
  },
  "ruleActions": [      
    {
      "actionType": "SetProperties",
      "actionName": "Update data_owner",
      "actionDisplayName": "data_owner",
      "actionAttributes": {
        "rule_action_property_name": "data_owner",
        "rule_action_property_value": "Joe"
      }
    },
    {
        "actionType": "RemoveBusinessTerms",
        "actionName": "Remove CA_Tax field term",
        "actionDisplayName": "rem-CA_Tax", 
        "actionAttributes": {
            "rule_action_threshold": "10",
            "rule_action_term_name": "DQM/CA_Tax"
        }
    }
  ]
}

  • ruleScope

    This field scopes the rule to the following filters:

    • resourceTerm

      This field filters the resources with the DQM.Employee resource term.

    • virtualFolders

      This field limits the rule evaluation to only the virtual folders named DQM.

    • fieldTerms

      This field filters for resources containing the DQM.CA_Tax field term.

    • termStates

      This field restricts the rule evaluation rule to consider terms in either the ACCEPTED or SUGGESTED states.

  • ruleBody

    This field inspects and validates the data values for a field tagged with DQM.stateCode to be "6A", and that the value of custom property data_owner is "Lara".

  • ruleActions

    This rule element contains an array of two actions:

    • Updating custom property value

      • rule_action_property_name

        This field specifies the property name for which the value will be updated, in this case, data_owner.

      • rule_action_property_value

        This field specifies the new value to which the custom property will be updated, in this case, "Joe".

      • rule_action_threshold

        This field is intentionally left out for rule actions on custom properties.

    • Remove field action

      • rule_action_term_name

        This field specifies the term DQM.CA_Tax to be removed.

      • rule_action_field

        This field is intentionally left out for the remove_tagging action.

      • rule_action_threshold

        This field specifies the minimum percentage evaluation match for the action to apply to the resource. In this example, all the resources where 50% or more of the data passes the rule evaluation are tagged with the DQM.CA_Employee resource term.

Requirements for writing rules

Avoid errors by strictly following these requirements when writing rules:

  • When using terms in the ruleBody for a Data query, you must prefix the terms with the @ and then the glossaryname/termname qualifier. In the absence of the @ qualifier, a term D.t is interpreted as a column name which may or may not exist and the corresponding results may be misreported.
  • When evaluating rules to set custom properties, you must prefix the custom property with the @@ qualifier.
  • Data Catalog supports minimal SQL functions in the rule definition such as AND, OR, <, >, IN, and length().
  • All terms specified in the actionAttribute field need to pre-exist.
  • In rule syntax, a termDomain.Term needs the dot replaced with a forward slash. For example, enter Built-in_Terms/Last_Name instead of Built-in_Terms.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.
  • To consider terms or a glossary with spaces, mention the term between `` symbols, such as: @`Glossary name/Term name’ > 200
  • To use field names with spaces, mention the the field name between `` symbols, such as: hasFieldName(`First Name`)=1
Data Catalog's rules framework does not create new terms. It only attaches an existing term to the resource or field specified.

Rule workflow

On the Business Rules page, you can create, update, edit, and delete rules.

Create a rule

Perform the following steps to create a new rule. After you create a rule, you can then configure the rule using Rule syntax.

Procedure

  1. Click Management on the menu bar to open the Manage Your Environment page, and then select Business Rules to open the Business Rules page. Click Add Business Rule.

    The Create Business Rule page opens.
  2. Enter the details for your rule:

    FieldDescription
    Business Rule Name (Required)Enter the unique name of the rule that your users will recognize. Names must start with a letter, and must contain only letters, digits, hyphens, or underscores. White spaces are not supported but trailing spaces are not allowed in names.
    Owner Select the username of the owner of the rule. The default value of this field is the logged-in user.
    DescriptionEnter a description for this rule. For example, you may want to indicate the purpose of the rule to assist other users.
    NoteEnter additional comments for the rule. For example, you may want to describe the workflow or use case of the rule.
    Rule EnabledBy default, a new rule is enabled. When you select Rule Enabled, all referenced names (such as custom properties, terms, fields, resources, and virtual folders) are verified for accuracy in the system.

    Clear the check box to disable the rule. When a Rules Execution job is run, disabled rules are skipped and are not evaluated.

  3. Click Create Business Rule to save your rule.

    The rule is created and the details are saved.

Next steps

Configure a rule

After you create a business rule, you can configure it in the Configuration view of the Business Rule page. This task assumes you have completed Create a rule and are on the Business Rules page.

Procedure

  1. If you have not already done so, locate the business rule you want to configure in the table of rules and select the View Details button (greater-than sign) in its row.

    The Business Rule page opens to the Details view for the selected rule.
  2. Click the Configuration tab.

  3. Enter the following configuration values for the rule:

    FieldDescription
    Data Quality DimensionsSelect the data quality dimension that defines your rule. This dimension is reflected in the data quality graph on the Data Canvas page. Options include:
    • None

      This rule is not used as a data quality metric.

    • Completeness

      The proportion of stored data against the business definition of “100% complete”.

    • Consistency

      This is the absence of difference when comparing two or more representations of an item against a definition. Each data item is measured against itself or its counterpart in another data set. Note that consistency assessment may not be applicable to all data items.

    • Uniqueness

      This is the inverse of an assessment of the level of duplication.

    • Validity

      Data is valid if it conforms to the syntax (format, type, range) of its business definition. Typically, this is the overall measure of data quality.

    Execute Rule By default, the rule is set to run manually.
    Advanced ModeSelect this check box to enter parameters for the rule. Clear this check box if you do not want to enter additional parameters.
    Set Rule ScopeDefine the filters for evaluating the rule.

    Rule scope includes the following parameters:

    • Virtual Folders

      Specify comma-separated virtual folders on which the rule should be executed.

    • ResourceTerms

      Specify comma-separated business terms to filter the virtual folder resources.

    • FieldTerms

      Specify comma-separated field terms to filter the resource terms.

    • sourcePropertyFilters

      Specify property name and value to filter.

    • termState

      Specify the state of the business term associations. Can be ACCEPTED, REJECTED, and/or SUGGESTED. If nothing is specified, it includes resources with any state.

    Rule Criteria Define the rule's criteria for evaluation using rule syntax, with MetadataRule or DataRule:
    • MetadataRule

      For dealing with metadata stored in the database. Syntax is:

      • hasResourceTerm(TermFullyQualifiedName) =1

        Work on the resources that have the given resource term associated with them

      • hasFieldTerm(TermFullyQualifiedName) =1

        Work on the fields/columns that have the given field term associated with them

    • DataRules

      For dealing with actual data present in the resource. Syntax is to specify the column name directly or use @TermFullyQualifiedName equivalent to the field with the given term name. Use @@customPropertyName to specify the custom property with specific values.

    Rule ActionsDefine the actions taken after the rule evaluation is accepted as true. This action can be associating a field or resource term, removing an associated term, or updating a custom property value. Specify the appropriate parameters for the following actions:
    • AddBusinessTerms or RemoveBusinessTerms

      • rule_action_term_name: term name that should be added or removed
      • rule_action_threshold: threshold value at which to perform the rule action
      • rule_action_field_name: specify the field name if the associated action should be performed on a field
    • SetProperties or ResetProperties

      • rule_action_property_name: custom property name
      • rule_action_property_value: value of the custom property
      • rule_action_compliance_value: for data rules for which the action should happen
  4. If the rule configuration values are entered correctly, click Save Changes.

    If there is a problem while saving your rule, an error message appears indicating the problem. Fix the problem and save your changes.
  5. To execute the rule, click Run Now next to the Execute Rule field.

    A confirmation message appears indicating that the business rule is submitted to jobs management and a notification is created for the user. Optionally, click View Details in the notification to check the status of the rule submission job.

Results

Your created and configured business rule appears on the Business Rules page. Select the rule if you want to run, edit, or remove the rule.

Update a rule

If you have already created and configured a rule, you can edit it from the Business Rules page.

Perform the following steps to edit a rule:

Procedure

  1. Click Management on the menu bar to open the Manage Your Environment page, and then select Business Rules to open the Business Rules page.

    The Business Rules page opens.
  2. Locate the business rule you want to configure in the table of rules and select the View Details button (greater-than sign) in its row.

    If you have a large number of rules, select Show Filters to help you find the rule you want to edit.The Business Rule page opens to the Details view for the selected rule.
  3. Edit the fields as needed in the Details and Configuration views.

  4. Click Save Changes.

    The rule is saved with your changes. If there is a problem while creating your rule, an error notification displays at the top of the page. Resolve the error and click Save Changes.

Delete a rule

If a rule is no longer needed, you can delete it. Perform the following steps to delete a rule:

Procedure

  1. Click Management on the menu bar to open the Manage Your Environment page, and then select Business Rules.

    The Business Rules page opens.
  2. Use the check box to select the rule you want to delete.

  3. Click the Actions menu and then click Remove. Optionally, select the More actions icon, then click Remove from the drop-down menu.

    A message appears confirming your business rule is now deleted.
  4. Click Close on the message box to return to the Business Rules page.

Rule execution

Like any job, you can trigger a Data Catalog rule as an independent job sequence or as a job template.

Even when triggered as a job sequence from a resource or virtual folder, the rule execution job runs across the entire Data Catalog, not just the resource or virtual folder from which it was triggered.

The command line syntax to execute rules is as follows:

<Agent>$ bin/ldc executeRules [-virtualFolder <VF name> [-path <path to a single resource only>]] \
                                    [--<system parameters for driver-memory/executor-memory/etc.>]

Where:

  • -virtualFolder

    When specified with the above command, the virtual folder mentioned overwrites the scope specified in the rules.

  • -path

    Specifies the path to a specific resource, file or table. Use with the -virtual folder parameter for rule execution on a specific file or table. Rule execution is not recursive, so if -path points to a directory or database, this parameter is ignored.

  • -<system parameters>

    Specifies any optional system specific parameters such as driver-memory or executor-memory.

Rule execution report

A rule execution report is a report of all the rules that summarizes how well a rule evaluates the resources in Data Catalog.

To generate a rule execution report, submit a job template with additional parameters.

The command syntax to enter is as follows:

<Agent>$ bin/ldc executeRules [-virtualFolder <VF name> [-path <path to a single resource only>]] \
                                    [-generateReport <true> -reportName <Name of the report being generated>] \
                                    [--<system parameters for driver-memory/executor-memory/etc.>]

Where the options are defined as follows:

  • -virtualFolder

    When specified with the above command, the virtual folder <VF name> overwrites the scope specified in the rules.

  • -path

    Path to a specific resource, file, or table. Use with the -virtual folder parameter for rule execution on a specific file or table. Rule execution is not recursive, so if the path points to a directory or database, this parameter is ignored.

  • -generateReport

    If this parameter is passed, rule execution generates a report with the name specified by the -reportName parameter.

  • -reportName

    User-defined name for the report being generated. Use with the -generateReport parameter.

  • -<system parameters>

    Any optional system-specific parameters such as driver-memory or executor-memory.

You also can use -reportsFolder <server folder> to specify a folder for generating reports.

All reports are generated in the /var/log/ldc/generatedReports directory. If you do not provide a report name, Data Catalog randomly generates a unique name that is shown on the command prompt.

Sample report

Based on the sample rules explained in the Metadata rule samples section, the rule execution report looks similar to the following example:

Sample rule execution report

"Sensitive_Term_Completeness" is the sample metadata rule and "CA_Term_Rule" is the sample data rule explained in Data rule samples.

Metadata rules always evaluate to either 100% (1) or 0% (0) since Lumada Data Catalog only checks for the presence of terms (metadata). Data rules have matches of varying percentages since Data Catalog evaluates the data attached to a specified field name or field term.

You can use the percentage match to identify the data quality, since fewer matches indicate lower quality data. The percentage setting is governed by rule_action_threshold, which also controls the amount of data in the report.