Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Managing rules

Parent article

AttentionThis article discusses framework features available in version 6.1 and later, including the Insert and Add Template actions in Rule Settings.

With Lumada Data Catalog's rules framework you can define, execute, and manage tag-based rules. These rules can evaluate data and metadata properties to add tags, remove tags, modify custom properties on data assets, and generate reports. The rules can reference data identifiers, such as table.column_3.

To manage rules, click Manage on the menu bar, then click Rules. From here, you can author, run, track, and manage all rules in the catalog. All rules must be entered using the Data Catalog rules language. After creating a new rule or updating an existing rule, you must set the Current status to Enabled for the rule to take effect.

When you write rules using tags, they are translated automatically into concrete rules that are bound and executed on individual data resources. This translation occurs regardless of which format and platform the resource is in, such as JDBC tables, Hive tables, CSV, Avro, or JSON, as long as it is a format Data Catalog supports. A rule written using tags captures business logic explicitly and can express many concrete rules.

You can view the following sample applications of the rules framework:

Using a rule for sensitive resource tagging

This example identifies and tags all resources containing sensitive data or personal identifiers such as names, addresses, social security numbers, and account information. Using the Lumada Data Catalog tag discovery features, you can identify field metadata and tag data fields such as first name, last name, and address. The Data Catalog built-in tags can identify these fields. Then, you can use a rule to check for any resources that contain tagged sensitive data fields and tag the resources as "Restricted Access".

Although not shown in exact syntax, the rule illustrated below is the only rule you need to write. The rule is tag-based and does not depend on an actual field name, resource name, or resource type.

When the rule is processed, it is automatically bound to all qualifying resources and attaches the tag you specify when a resource contains the sensitive fields. If you have 100 CSV files, 200 JDBC tables, and 30 Avro files that are all sensitive, they all are labeled correctly after executing this rule.

Metadata rule with field tag for sensitive data
Syntax partDefinition
ScopehasFieldTag(First Name) AND hasFieldTag(Last Name) AND hasFieldTag(Address)
RulehasFieldTag(SSN)
Actiontag SSN field as "Sensitive"

Resource tagging based on data properties

This example attaches a resource tag to all resources where the data for a given field is within a certain range. With the Data Catalog rules, a simple data rule defining the condition can identify the resources to which the condition applies and take the corresponding action.

Data rule with resource tag
Syntax partDefinition
ScopehasTag (Employee)
Rule@Category between (100, 199) and @TAX_state = 6A
Actiontag resource as "CA Employee"

Rule syntax

A rule executes on all the resources in Lumada Data Catalog, applying and evaluating the rule against one resource at a time, and executing the specified rule action. All rules must be entered using the Data Catalog rules language.

The language of the Data Catalog rules provides constructs for expressing scope, conditions, and actions. A unique capability in Data Catalog is that you can express all these constructs based on tags as well as with actual resource and field names.

Given a rule with Scope S, Body B, and Action A, the semantics of the rule can be summarized as: "For any resource R that is within S, if B evaluates to true, then perform all actions listed in A on R."

The rule syntax contains three parts:

  • Rule scope

    Sets the scope of resources on which the rule is evaluated and applied.

  • Rule body

    Defines the condition in a SQL predicate.

  • Rule action

    Defines the action to be taken on resources that conform to the rule evaluation, such as resource tagging, tag removal, setting custom property values, and report generation.

NoteIn rule syntax, a tagDomain.Tag needs the dot replaced with a forward slash. For example, enter Built-in_Tags/Last_Name instead of Built-in_Tags.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.

Rule scope

The rule scope defines the resources for rule execution. You can define the rule scope by specifying scope types in the Scope window.

Rule scope default settings

  • Virtual folders

    At least one virtual folder is required for the rule to compile. When a single virtual folder is listed, rule execution is run against all the resources under the listed virtual folder. You can enter additional virtual folders as a comma-separated list. If a folder no longer exists when the rule is executed, it is ignored.

  • Source property filters

    Comma-separated list of the source property filters key-value pairs.

  • Field tags

    List of field tag filters.

  • Resource tags

    List of resource tag filters.

  • Tag association states

    List of tag states that the rules evaluate.

NoteIn rule syntax, a tagDomain.Tag needs the dot replaced with a forward slash. For example, enter Built-in_Tags/Last_Name instead of Built-in_Tags.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.

For example:

"ruleScope": {
        "virtualFolders": [
            "HDFS"
        ],
        "sourcePropertyFilters": {},
        "fieldTags": [
            "Built-in_Tags/Last_Name",
            "Built-in_Tags/US_Address"
        ],
        "resourceTags": [
            "CA/Employee"
        ],
        "tagStates": [
            "ACCEPTED",
            "SUGGESTED"
        ]
    }

You can also define or update the rule scope using the Insert button. When using Insert, the placement of your cursor in the Scope window determines the insertion point for your selection. If the cursor is placed in the code or on a code line, then the item is inserted at the point of placement. If the cursor is placed outside the code, then the item is inserted on a new line. When entering your definition details, you can select from the system suggestions to help you complete the field entries.

The following selections are available using Insert:

  • Custom Property

    Enter a custom property name that exists in your system and click Insert Custom Property.

  • Datasource

    Enter the data source name, then click Insert Datasource.

  • Field

    Enter the resource name, select a field name from that resource, and then click Insert Field.

  • Tag

    Enter the tag domain name, select a tag name from that domain, and then click Insert Tag.

  • Virtual Folder

    Enter the virtual folder name, then click Insert Virtual Folder.

You can return the rule scope to its default settings by clicking Reset.

Rule body

The rule body defines the rule that is translated and evaluated into a query to be executed against every qualifying resource as defined in the rule scope. You can define the rule body by specifying rule types in the Body window.

Rule body default setting

For example, you can insert a clause that determines what the rule body acts on. The query clause determines if the rule acts on metadata.

NoteIn rule syntax, a tagDomain.Tag needs the dot replaced with a forward slash. For example, enter Built-in_Tags/Last_Name instead of Built-in_Tags.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.
  • Metadata query is compiled using resource metadata

    For example, the rule body hasFieldTag(Built-in_Tags/Social_Security_Number_Delimited) = 1 checks for the presence of the field tag Built-in_Tags.Social_Security_Number_Delimited.

  • Data query is compiled using the resource data

    • Data query operating on field tags

      For example, the rule body (@EMS/Category >= 100 and @EMS/Category <= 199) and @EMS/Tax_State = '6A' inspects the data in the field tagged with EMS.Category for values between 100 and 199, when the data in the field tagged with EMS.Tax_State has a value of "6A".

    • Data query operating on custom property

      For example, the rule body @@business = 'MagnUX' operates on custom properties looking for specific values.

Note the following conventions for data queries:

  • Prefixing a FieldTag with an "@" indicates the rule operates on the data tagged by the FieldTag.
  • The presence of "@@" indicates the rule operates on a custom property and its values.

Depending on the rule type, the following ruleBody queries are possible, where FieldTag is a full tag name including the domain it is associated with:

Metadata queryData query
Evaluates against metadata. Rules query for metadata discovered by Lumada Data CatalogInspect the data when evaluating rules. Rules query for the specific data value identified by the tag.
Evaluating on field tag ruleBody: hasFieldTag(A/x) AND hasFieldTag(B/y)FieldName1 IN (val1,val2) AND FieldName2 = ‘Some Value’
Evaluating on field tag ruleBody: hasResourceTag(M/j) OR hasFieldTag(A/x)(@Domain1/Tag1 + @Domain2/Tag1) < @Domain3/Tag1
Evaluating on resource tag ruleBody: hasResourceTag(M/j) AND hasResourceTag(N/f)@FieldTag1 = ‘someValue’
(@fieldTag1 >= 100 and @fieldTag1 <= 199) and @fieldTag2 = 'some_value'
@Built-in_Tags/US_City = 'Los Angeles' OR @Built-in_Tags/US_City IN ('Fresno', 'Los Angeles', 'San Francisco') OR @Built-in_Tags/US_City = 'Folsom' OR 'city.*' = 'Los Angeles' OR length('city.\*') > 5 OR length(@Built-in_Tags/US_City) > 6
CASE statement support @Built-in_Tags/US_Zip_Code in ('10003', '10019', '10036', '10014') and (case when hasFieldTag(@Built-in_Tags/US_City) =1 then @Built-in_Tags/US_City is null else true end) @FieldTag1 > 'someValue'
Evaluating on custom property @@business = 'MagnUX' @@strike-count = '3'

You can also define or update the rule body using the Insert button. When using Insert, the placement of your cursor in the Body window determines the insertion point for your selection. If the cursor is placed in the code or on a code line, then the item is inserted at the point of placement. If the cursor is placed outside the code, then the item is inserted on a new line. When entering your definition details, you can select from the system suggestions to help you complete the field entries.

The following selections are available using Insert:

  • Custom Property

    Enter a custom property name that exists in your system and click Insert Custom Property.

  • Datasource

    Enter the data source name, then click Insert Datasource.

  • Field

    Enter the resource name, select a field name from that resource, and then click Insert Field.

  • Tag

    Enter the tag domain name, select a tag name from that domain, and then click Insert Tag.

  • Virtual Folder

    Enter the virtual folder name, then click Insert Virtual Folder.

You can return the rule scope to its default settings by clicking Reset.

Rule action

The rule action defines the action to be taken if the ruleBody evaluates to true (1).

A rule action is an array of actions and an action can apply only one tag. To apply multiple tags, you must submit a ruleAction for each tag.

In the actionAttributes body:

  • The presence of the rule_action_field entry indicates field tagging. The field name specified is tagged with the tag provided in the rule_action_tag_name.
  • The absence of the rule_action_field entry implies resource tagging. The resource is tagged with the tag provided in the rule_action_tag_name.
  • The rule_action_threshold entry is used only with a data rule and defines the percentage of rows that should satisfy the rule before the rule action is applied.

You can define the rule action by specifying action types in the Action window:

Rule action default settings

The following are the rule action types:

  • Tagging

    When actionType is set to Tagging, the ruleAction makes tag associations based on rule evaluation. A tag suggestion can be applied on a specific field or on a qualifying resource. When applying a tag suggestion of a field, the field is identified in one of the following ways:

    • Full field name.
    • Wildcard field name with partial string match for field name. Wildcard strings should follow the JAVA regular expression pattern format.
    • Referencing another field tag associated with the field.
    NoteThe Lumada Data Catalog rule framework does not create new tags. Any tag suggestions to be applied as part of rule action must be for existing tags. If an associated tag does not exist, Data Catalog displays an error message.
  • Remove Tagging

    When actionType is set to remove_tagging, the ruleAction removes the tag associations based on the rule evaluation.

  • Properties

    When actionType is set to Properties, the ruleAction sets custom property values.

    Property values are strings. You specify property names with @@ and its string is substituted.You can use property actions to set and reset property values. To reset a property value, leave the field empty.

The following code sample gives an example of each action type:

NoteIn rule syntax, a tagDomain.Tag needs the dot replaced with a forward slash. For example, enter Built-in_Tags/Last_Name instead of Built-in_Tags.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.
{
  "ruleActions": [
    {
      "actionType": "Tagging",
      "actionName": "TagFieldByFieldName",
      "actionDisplayName": "Bind FieldTag by matching fieldName",
      "actionAttributes": {
        "rule_action_field": "FieldNameToBeTagged", 
        "rule_action_threshold": "40",      
        "rule_action_tag_name": "Field/TagToBeApplied"
      }
    },
    {
      "actionType": "Tagging",
      "actionName": "TagFieldByAnotherFieldTag",
      "actionDisplayName": "Bind FieldTag by matching another fieldTag",
      "actionAttributes": {
        "rule_action_field": "@SomeDomain/AnotherFieldTag", 
        "rule_action_threshold": "40",      
        "rule_action_tag_name": "SomeDomain/NewTagToBeApplied"
      }
    },
    {
      "actionType": "Tagging",
      "actionName": "TagFieldByWildcardFieldName",
      "actionDisplayName": "Bind FieldTag by matching wildcard fieldName",
      "actionAttributes": {
        "rule_action_field": "WildcardFieldNam.*", 
        "rule_action_threshold": "40",      
        "rule_action_tag_name": "SomeDomain/FieldTagToBeApplied"
      }
    },
    {
      "actionType": "Tagging",
      "actionName": "Resource Tagging",
      "actionDisplayName": "Tag Resources with Resource/TagToBeApplied",
      "actionAttributes": {
        "rule_action_threshold": "60",     
        "rule_action_tag_name": "SomeDomain/ResourceTagToBeApplied"
      }
    },
    {
      "actionType": "remove_tagging",
      "actionName": "removeFieldTag",
      "actionDisplayName": "TagName2BeRemoved",
      "actionAttributes": {
        "rule_action_threshold": "60",     
        "rule_action_tag_name": "SomeDomain/TagToBeRemoved"
      }
    },
    {
      "actionType": "Properties",
      "actionName": "UpdatePropertyValue",
      "actionDisplayName": "propValueUpdate",
      "actionAttributes": {
        "rule_action_property_name": "propName2BeChanged",
        "rule_action_property_value": "newPropValue"
      }
    },
    {
      "actionType": "Properties",
      "actionName": "ResetPropertyValue",
      "actionDisplayName": "propValueReset",
      "actionAttributes": {
        "rule_action_property_name": "propName2BeReset",
        "rule_action_property_value": ""
      }
    }
  ]
}

You can also define or update the rule action using the Add Template button. When using Add Template, the template is inserted on a new line. When entering your definition details, you can select from the system suggestions to help you complete the field entries.

The following selections are available using Add Template:

  • Tagging template

    Adds a tagging template.

  • Remove Tagging template

    Removes a tagging template.

  • Property template

    Adds a property template.

  • Reset text

    Removes any changes and returns the rule action default settings.

Sample rules

You can use these examples of metadata and data rules to help you write rules for your implementation of Lumada Data Catalog:

Metadata rule samples

Metadata rules work with the metadata associated with the resource or field.

The metadata rule examples below show the following situations:

  • Field tag binding using a field name
  • Field tag binding using a wildcard or partial match of a field name
Field tag binding using a field name

The following sample rule validates the presence of Built-in_Tags.Social_Security_Number_Delimited for all resources containing the field tags Built-in_Tags.Last_Name and Built-in_Tags.Address either in ACCEPTED or SUGGESTED state within the HDFS virtual folder, then applies the resource tag PII.Sensitive to the field named "SSN".

NoteIn rule syntax, a tagDomain.Tag needs the dot replaced with a forward slash. For example, enter Built-in_Tags/Last_Name instead of Built-in_Tags.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.

{
    "name": "Sensitive_Tag_Completeness",
    "description": "If field Tags Built-in_Tags/Last_Name Built-in_Tags/Address Built-in_Tags/social_security present then add PII/Sensitive tag to SSN field",
    "ruleBody": "hasFieldTag(Built-in_Tags/Social_Security_Number_Delimited) = 1",
    "metadataRule": [],
    "ruleScope": {
        "virtualFolders": [
            "HDFS"
        ],
        "sourcePropertyFilters": {},
        "fieldTags": [
            "Built-in_Tags/Last_Name",
            "Built-in_Tags/US_Address"
        ],
        "resourceTags": [
            "CA/Employee"
        ],
        "tagStates": [
            "ACCEPTED",
            "SUGGESTED"
        ]
    },   
    "ruleActions": [
        {
            "rule_action_field": "SSN",
            "actionType": "Tagging",
            "actionName": "PII",
            "actionDisplayName": "PII",
            "actionAttributes": {
                "rule_action_threshold": "40",
                "rule_action_tag_name": "PII/Sensitive"
           }
        }
    ]
}

  • ruleBody

    This field validates the presence of field tag Built-in_Tags.Social_Security_Number_Delimited in the resources.

  • ruleScope

    This field applies various filters or scopes the rule to specific resources:

    • virtualFolders

      This field limits the rule evaluation to only the virtual folder named DQM.

    • fieldTags

      This attribute further filters resources that contain the field tags Built-in_Tags.Last_Name and Built-in_Tags.Address.

    • resourceTags

      This field further limits evaluation to resources containing the CA.Employee resource tag.

    • tagStates

      This field restricts the rule evaluation rule to consider tags in either the ACCEPTED or SUGGESTED states.

  • ruleAction

    This field specifies the attributes to apply the action, in this case to attach PII.Sensitive to the field SSN in all resources the which the rule applies.

  • rule_action_threshold

    This field specifies the minimum per cent evaluation match for the action to be taken. In the above example, all the resources where 40 percent or more of the data passes the rule evaluation are tagged with the PII.Sensitive field tag.

Field tag binding using a wildcard or partial match of a field name

The following sample rule validates the presence of the field tag Built-in_Tags.Email in all resources within the DQM, Holdings, and Bank_Retail virtual folders, then applies the field tag PII.contactType to the fields beginning with email*.

{
    "name": "EmailTagger",
    "ruleBody": "hasFieldTag(Built-in_Tags/Email)=1",
    "ruleScope": {
        "virtualFolders": ["DQM", "Holdings", "Bank_Retail"],
        "sourcePropertyFilters": {},
        "fieldTags": [],
        "resourceTags": [],
        "tagStates": ["ACCEPTED", "SUGGESTED"]
    },
    "ruleActions": [
        {
            "actionType": "Tagging",
            "actionName": "tag",
            "actionDisplayName": "tag",
            "actionAttributes": {
                "rule_action_field": "email*",
                "rule_action_tag_name": "PII/contactType",
                "rule_action_threshold": "0"
            }
        }
    ]
}

  • ruleBody

    This field validates the presence of field tag Built-in_Tags.Email in the resources.

  • ruleScope

    This field applies various filters or scopes the rule to specific resources:

    • virtualFolders

      This field limits the rule evaluation to only the virtual folders named DQM, Holdings and Bank_Retail.

    • tagStates

      This field restricts the rule evaluation to tags in either the ACCEPTED or SUGGESTED states.

  • ruleAction

    This field specifies the attributes to apply the action, in this case to attach PII.contactType as a field tag to all fields containing email in all qualifying resources.

  • rule_action_threshold

    This field specifies the minimum percentage evaluation match for the action to be taken. In the above example, all the resources where 0% or more of the data passes the rule evaluation are tagged with the PII.contactType field tag.

Data rule samples

A data rule inspects the data for a field or field tag when evaluating a resource.

The data rule examples below show the following situations:

  • Resource tag and field tag binding
  • Custom property setting and field tag removal
Resource tag and field tag binding

This rule is for resource tag binding on resources that contain a certain type of data. It checks for all resources with the resource tag DQM.Employee and examines the data in fields tagged with DQM.taxCode and DQM.stateCode for qualifying data, then attaches the resource tag DQM.CA_Employee and DQM.CA_Tax tag to the field tagged with DQM.stateCode.

NoteIn rule syntax, a tagDomain.Tag needs the dot replaced with a forward slash. For example, enter Built-in_Tags/Last_Name instead of Built-in_Tags.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.
{
    "name": "CA_Tag_Rule",
    "ruleBody": "(@DQM.taxCode >= 100 and @DQM.taxCode <= 199) and @DQM.stateCode = '6A'",
    "metadataRule": [],
    "ruleScope": {
        "virtualFolders": ["DQM"],
        "sourcePropertyFilters": {},
        "fieldTags": [],
        "resourceTags": [
            "DQM/Employee"
        ],
        "tagStates": [
            "ACCEPTED",
            "SUGGESTED"
        ]
    },
    "ruleActions": [
        {
            "actionType": "Tagging",
            "actionName": "ResourceTagging",
            "actionDisplayName": "CA-ResourceTag",
            "actionAttributes": {
                "rule_action_threshold": "50",
                "rule_action_tag_name": "DQM/CA_Employee"
            }
        },
        {
            "actionType": "Tagging",
            "actionName": "FieldTagging",
            "actionDisplayName": "DQM-CA_Tax",
            "actionAttributes": {
                "rule_action_threshold": "10",
                "rule_action_tag_name": "DQM/CA_Tax",
                "rule_action_field": "@DQM.stateCode"
            }
        }
    ]
}

The rule elements are used as follows:

  • ruleScope

    The ruleScope element scopes the rule to the following filters:

    • virtualFolders

      This field limits the rule evaluation to only the virtual folders named DQM.

    • resourceTag

      This field filters the resources that have the DQM.Employee resource tag.

    • tagStates

      This field restricts the rule to tags in either the ACCEPTED or SUGGESTED states.

  • ruleBody

    The ruleBody element limits the data to resources with fields tagged with DQM.taxCode that have values between 100 and 199, and the data in the field tagged with DQM.stateCode contains the value "6A".

  • ruleActions

    The ruleActions element is an array of two actions:

    • Resource tagging action

      • rule_action_tag_name

        This field specifies the new tag DQM.CA_Employee to bind to qualifying resources.

      • rule_action_field

        This field is left out intentionally to indicate resource tagging.

      • rule_action_threshold

        This field specifies the minimum percentage evaluation match for the action to apply. In the above example, all the resources where 50% or more of the data passes the rule evaluation are tagged with the DQM.CA_Employee resource tag.

    • Field tagging action

      • rule_action_tag_name

        The rule_action_tag_name from one action specifies the new tag DQM.CA_Employee to bind to qualifying resources, while the other action specifies the DQM.CA_Tax tag that binds to a field specified in the rule_action_field.

      • rule_action_field

        The presence of rule_action_field indicates field tagging, and the reference tag @DQM.stateCode is used for binding the DQM.CA_Tax tag.

        NoteThe rule_action_field is used in two ways:
        • When you prefix the value in rule_action_field with @, the value is used as a tag.
        • Without the @, this field is interpreted as the field name that is used for binding (full or wildcard).
      • rule_action_threshold

        The rule_action_threshold field specifies the minimum percentage evaluation match for the action to apply. For example, all the resources where 50% or more of the data passes the rule evaluation are tagged with the DQM.CA_Employee resource tag.

Custom property setting and field tag removal

This example is for a rule updating a custom property value and removing a field tag.

This rule checks for all resources with the resource tag DQM.Employee and field tag DQM.CA_Tax. It checks the data in fields tagged with DQM.stateCode for the value is "6A". It also validates the value of the custom property data_owner, checking to see if the value is '"Lara". If both these conditions are satisfied, the rule updates the value of data_owner to "Joe" and removes the tag DQM.CA_Tax bound to fields tagged with DQM.stateCode.

{
  "name": "Remove CA_Tax field tag and update data_owner custom property",
  "ruleBody": "((@DQM.stateCode = '6A') AND (@@data_owner = 'Lara')",
  "metadataRule": [],
  "ruleScope":{
    "virtualFolders":["Banking", "DQM", "LDC-Warehouse"],
    "sourcePropertyFilters": {},
    "fieldTags": ["DQM/CA_Tax"],
    "resourceTags": ["DQM/Employee"],
    "tagStates": ["ACCEPTED", "SUGGESTED"]
  },
  "ruleActions": [      
    {
      "actionType": "Properties",
      "actionName": "Update data_owner",
      "actionDisplayName": "data_owner",
      "actionAttributes": {
        "rule_action_property_name": "data_owner",
        "rule_action_property_value": "Joe"
      }
    },
    {
        "actionType": "remove_tagging",
        "actionName": "Remove CA_Tax field tag",
        "actionDisplayName": "rem-CA_Tax", 
        "actionAttributes": {
            "rule_action_threshold": "10",
            "rule_action_tag_name": "DQM/CA_Tax"
        }
    }
  ]
}

  • ruleScope

    This field scopes the rule to the following filters:

    • resourceTag

      This field filters the resources with the DQM.Employee resource tag.

    • virtualFolders

      This field limits the rule evaluation to only the virtual folders named DQM.

    • fieldTags

      This field filters for resources containing the DQM.CA_Tax field tag.

    • tagStates

      This field restricts the rule evaluation rule to consider tags in either the ACCEPTED or SUGGESTED states.

  • ruleBody

    This field inspects and validates the data values for a field tagged with DQM.stateCode to be "6A", and that the value of custom property data_owner is "Lara".

  • ruleActions

    This rule element contains an array of two actions:

    • Updating custom property value

      • rule_action_property_name

        This field specifies the property name for which the value will be updated, in this case, data_owner.

      • rule_action_property_value

        This field specifies the new value to which the custom property will be updated, in this case, "Joe".

      • rule_action_threshold

        This field is intentionally left out for rule actions on custom properties.

    • Remove field action

      • rule_action_tag_name

        This field specifies the tag DQM.CA_Tax to be removed.

      • rule_action_field

        This field is intentionally left out for the remove_tagging action.

      • rule_action_threshold

        This field specifies the minimum percentage evaluation match for the action to apply to the resource. In this example, all the resources where 50% or more of the data passes the rule evaluation are tagged with the DQM.CA_Employee resource tag.

Requirements for writing rules

Avoid errors by strictly following these requirements when writing rules:

  • When using tags in the ruleBody for a Data query, you must prefix the tags with the @ qualifier. In the absence of the @ qualifier, a tag D.t is interpreted as a column name which may or may not exist and the corresponding results may be misreported.
  • When evaluating rules to set custom properties, you must prefix the custom property with the @@ qualifier.
  • Lumada Data Catalog supports minimal SQL functions in the rule definition such as AND, OR, <, >, IN, and length().
  • Data Catalog supports CASE statements in predicates with the following syntax:

    CASE valueExpression whenClause+ (ELSE elseExpression=expression)? END #simpleCase

    or

    CASE whenClause+ (ELSE elseExpression=expression)? END #searchedCase

  • All tags specified in the actionAttribute field need to pre-exist.
  • In rule syntax, a tagDomain.Tag needs the dot replaced with a forward slash. For example, enter Built-in_Tags/Last_Name instead of Built-in_Tags.Last_Name. This replacement should be made regardless of the part of the rule in which it is located.
Data Catalog's rules framework does not create new tags. It only attaches an existing tag to the resource or field specified.

Rule workflow

On the Rules Settings page, you can create, update, edit, and delete rules.

Create a rule

Perform the following steps to create a new rule. Use Rule syntax when creating rules.

Procedure

  1. Click Manage on the menu bar, then select Rules.

    The Rules page opens.
  2. Click + Create New Rule.

    Rule Settings page

    The Rules Settings page opens.
  3. Enter the following information for your rule:

    FieldDescription
    NameEnter the unique name of the rule that your users will recognize. Names must start with a letter, and must contain only letters, digits, hyphens, or underscores. White spaces in names are not supported.
    Current statusSelect the status of the rule. You can select Enabled or Disabled for your rule. When a Rules Execution job is triggered, disabled rules are skipped and are not evaluated.

    By default, a new rule is Disabled. When you select Enabled, all referenced names (custom properties, tags, fields, resources, virtual folders, etc.) are verified for accuracy in the system.

    ScopeDefine the filters for evaluating the rule.

    You can edit the rule manually or use the Insert button on the Scope window, as described in Rule scope.

    BodyDefine the rule that will be evaluated using rule syntax.

    You can edit the rule manually or use the Insert button on the Body window, as described in Rule body.

    ActionDefine the action taken once the rule evaluation is accepted as true. This action can be associating a field or resource tag, removing an associated tag, or updating a custom property value.

    You can edit the rule manually or use the Insert button on the Action window, as described in Rule action.

  4. Click Create to save your rule.

    The rule is created. If there is a problem while creating your rule, an error notification displays at the top of the page. Resolve the error and click Create.

Next steps

Set Current status to Enabled to make the rule effective.

Update a rule

If you have already created a rule, you can click Enabled or Disabled on the Rules Settings page to enable or disable an existing rule. Additionally, you can edit rules.

Perform the following steps to edit a rule:

Procedure

  1. Click Manage on the menu bar, then select Rules.

    The Rules page opens.
  2. Locate the rule you want to edit, click its More actions icon, and then select the Edit option from the drop-down menu.

    Rule Settings page

    The Rules Settings page opens.
  3. Edit the fields as needed.

  4. Click Create to save your rule.

    The rule is saved with your changes. If there is a problem while creating your rule, an error notification displays at the top of the page. Resolve the error and click Create.

Next steps

Set Current status to Enabled to make the rule effective.

Delete a rule

If a rule is no longer needed, you can delete it. Perform the following steps to delete a rule:

Procedure

  1. Click Manage on the menu bar, then select Rules.

    The Rules page opens.
  2. Select the rule you want to delete.

  3. Click the Delete icon. Optionally, select the More actions icon, then click Delete from the drop-down menu.

  4. Click Save.

Rule execution

Like any job, you can trigger a Lumada Data Catalog rule as an independent job sequence or as a job template.

Even when triggered as a job sequence from a resource or virtual folder, the rule execution job runs across the entire Data Catalog, not just the resource or virtual folder from which it was triggered.

Run Job Now dialog box

The command line syntax to execute rules is as follows:

<Agent>$ bin/ldc executeRules [-virtualFolder <VF name> [-path <path to a single resource only>]] \
                                    [--<system parameters for driver-memory/executor-memory/etc.>]

Where:

  • -virtualFolder

    When specified with the above command, the virtual folder mentioned overwrites the scope specified in the rules.

  • -path

    Specifies the path to a specific resource, file or table. Use with the -virtual folder parameter for rule execution on a specific file or table. Rule execution is not recursive, so if -path points to a directory or database, this parameter is ignored.

  • -<system parameters>

    Specifies any optional system specific parameters such as driver-memory or executor-memory.

Rule execution report

A rule execution report is a report of all the rules that summarizes how well a rule evaluates the resources in Lumada Data Catalog.

To generate a rule execution report, submit a job template with additional parameters.

Entering options to generate a rule execution report

The command syntax to enter in the Command line options field is as follows:

<Agent>$ bin/ldc executeRules [-virtualFolder <VF name> [-path <path to a single resource only>]] \
                                    [-generateReport <true> -reportName <Name of the report being generated>] \
                                    [--<system parameters for driver-memory/executor-memory/etc.>]

Where the options are defined as follows:

  • -virtualFolder

    When specified with the above command, the virtual folder <VF name> overwrites the scope specified in the rules.

  • -path

    Path to a specific resource, file, or table. Use with the -virtual folder parameter for rule execution on a specific file or table. Rule execution is not recursive, so if the path points to a directory or database, this parameter is ignored.

  • -generateReport

    If this parameter is passed, rule execution generates a report with the name specified by the -reportName parameter.

  • -reportName

    User-defined name for the report being generated. Use with the -generateReport parameter.

  • -<system parameters>

    Any optional system-specific parameters such as driver-memory or executor-memory.

You also can use -reportsFolder <server folder> to specify a folder for generating reports.

All reports are generated in the /var/log/ldc/generatedReports directory. If you do not provide a report name, Data Catalog randomly generates a unique name that is shown on the command prompt.

Sample report

Based on the sample rules explained in the Metadata rule samples section, the rule execution report looks similar to the following example:

Sample rule execution report

"Sensitive_Tag_Completeness" is the sample metadata rule and "CA_Tag_Rule" is the sample data rule explained in Data rule samples.

Metadata rules always evaluate to either 100% (1) or 0% (0) since Lumada Data Catalog only checks for the presence of tags (metadata). Data rules have matches of varying percentages since Data Catalog evaluates the data attached to a specified field name or field tag.

You can use the percentage match to identify the data quality, since fewer matches indicate lower quality data. The percentage setting is governed by rule_action_threshold, which also controls the amount of data in the report.