Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Business entities

Parent article

During discovery, business entities help narrow the search for matching data and apply data context filtering. The creation of a business entity can help tag discovery make more meaningful suggestions by fine-tuning the accuracy of the discovery process.

A business entity is a tag hierarchy that provides more context for the data, which can help eliminate false positives in search results. With business entities, tag discovery can filter out non-compliant tag-field associations and create tag-resource associations. For example, a business entity can contain three fields that, when appearing together, help to define a customer. When the fields appear by themselves there is no meaning, but when they appear together, you can derive meaning. In summary, business entities provide feedback to the discovery process to account for enterprise data conventions and data modeling practices.

In the Data Catalog user interface, you can see that a tag hierarchy is a business entity if the grouped tags and labels icon is filled-in or black: Business entity icon

For example, you may have an Age tag that identifies the age of clients for an insurance claims case. Since Age is numeric data, tag discovery also lists Age tag suggestions for resources with fields that contain a quantity of claims processed, or the resources with fields that represent a numeric value for an internal processing code.

You can add context and meaning to the Age tag by making it part of a business entity. You can add context by requiring that, to suggest the Age tag on a field, the resource also needs to have the Person_name field tag, or the Claim_number field. Once you require this context, in the next round of full tag discovery evaluation, you may see a sharp reduction in the number of false positives. Now the Age tag is suggested only on the claim resources with the expected context.

Business entities are defined manually using strong anchor tags defined by Data Stewards. These anchor tags need to be tags that can be discovered on their own.

NoteThe creation of business entities and their use in fine-tuning discovery are an activity for advanced Data Catalog users whose job it is to verify information is tagged correctly.

Anchor and attribute tags

Only parent tags can be converted to act as business entities. The child tags of a business entity are called business entity members. Business entity members or tags can be one of two types: anchor or attribute.

  • Anchor tags

    Anchor tags are context independent tags that can be discovered on their own and need to be explicitly identified. They are used as conditions in the rule that determines the data that matches the business entity definition.

    In the insurance claims example, the Person_name tag and Claim_number tag are anchor tags that can be discovered independently without other context, and they set the context for the Age business entity tag.

    After you have created a business entity, to set one of its tags as an anchor, select the Anchor check box on the Entities tab of the tag. You can manually associate the anchor with resource fields to force the context of the resource.

    Anchor tags are indicated with an anchor icon, as shown below:Business entities anchor tag icon

  • Attribute tags

    Any business entity member that is not an anchor is an attribute. Attribute tags are context dependent: if the anchor tag conditions are satisfied, then the system makes associations for attribute tags only within the context of the business entity. Any business entity child tag which is not an anchor becomes an attribute tag.

Business entity expressions

A business entity expression determines whether a data resource is a business entity. If a data resource is evaluated as being part of a business entity, you can associate its fields with attribute tags of the business entity.

The context applied by anchor tags is formulated in the form of a Boolean expression with a predicate such as:

hasTag(<DomainName>.<FullTagName>) = <1/0>

Where the hasTag() function semantics are:

  • If the tag is in context, return 1
  • If the tag is not in context, return 0

If any resource field has an association with tag T, then tag T is in context.

When entering the tag expression, you must quote the domain and tag names with spaces within back quotation characters (`).

The default expression provided is an OR expression, but you can enter a custom expression by clicking Custom.

See the example of a custom expression below:

Example of tag name with spaces

The business entity expression allows the operators and separators AND, OR, NOT, parentheses, and spaces to be enclosed in back quote characters.

The following is a sample expression:

hasTag(ADDRESSES.AddressUs.AddressLine)=1 and (hasTag(ADDRESSES.AddressUs.FullName)=1 or
        (hasTag(ADDRESSES.AddressUs.LastName)=1 and hasTag(ADDRESSES.AddressUs.FirstName)=1)) and
        hasTag(ADDRESSES.AddressUs.StateAbbr)=1

If you select the Creates tag associations checkbox on the Entities tab for a tag, resource-field association suggestions are made automatically the next time discovery is run.

NoteLumada Data Catalog does not validate business entity expressions in the user interface or check for cyclical references. Data Catalog does some validation during tag discovery, but Data Stewards should perform their own validations on business entity expressions.

After you make any changes to the business entity expression, you must run tag discovery without the incremental flag selected, or with the option -incremental false, which is necessary because business entity discovery does not support incremental updates.

Create a business entity

Before you begin

To create business entities, you need to be assigned the following role permissions:
  • View Tag Domains permission
  • Access to the relevant tag domain
  • Manage Tag permission
Perform the following steps to create a business entity:

Procedure

  1. Create a tag hierarchy around a desired context, such as bill of materials, credit card information, or PII information.

  2. Select the parent tag in the hierarchy and on the Settings tab, toggle the BUSINESS ENTITY switch to ON and click Save.

    After turning the BUSINESS ENTITY switch on, the parent tag icon changes from white to black to indicate a business entity. Each tag in the business entity also has the tag icon changed from white to black, and the Entities tab is added to the other tags in the business entity.
  3. Choose one or more child tags in this hierarchy to act as context-independent anchor tags for the business entity. These anchor tags can be seed tags, regular expression tags, or reference tags to other seed or regular expression tags. Perform the following steps to set a child tag as an anchor tag:

    1. Select the tag you want to set as an anchor tag.

    2. Click the Entities tab and select the Anchor check box.

    3. Click Save.

  4. (Optional) To specify a tag as a reference tag, use the following steps:

    1. Select the tag and click the Settings tab.

    2. In the Reference field, start typing the name of the tag you want to use as reference, and select the best match from the list that displays.

    3. Scroll to the bottom of the tab and click Save.

    Tag suggestions in the Reference field depend on the tag domains accessible to your user role.

  5. Repeat the previous step for any other anchor tags.

  6. (Optional) Customize the business entity expression. By default, Data Catalog defines the business entity expression as an OR function of the member anchor tags defined.

    You can define expressions for business entities based on rules described in Business entity expressions.

    For example, if a custom expression is defined as hasTag(PersonalData.PII.SSN)=1 AND hasTag(PersonalData.PII.FirstName)=1, then attribute tags are suggested for resources that also have both PersonalData.PII.SSN and PersonalData.PII.FirstName tags present in that resource.

    You can click Copy expression to copy the default expression as a starting point for the custom expression.
    NoteIt is a best practice to copy the expression and edit it in another application, rather than editing the expression on the Entities tab, so you do not lose the current expression.
    When you are finished editing the expression, you can paste it over the current expression.
  7. (Optional) You can also click the Entities tab and select the Creates tag associations check box to request that Data Catalog create business entity tag associations.

    These business entity tags are discovered resource tags and will have a weight confidence of 100%. Unlike regular resource tags, business entity tags are the only resource tags that can be suggested and always have a weight confidence of 100%.
  8. Click Save to save the business entity.

  9. Run tag discovery without the incremental flag selected, or with the option -incremental false.

Edit a business entity

To fine tune data discovery, you may want to edit a business entity. Some of the tasks you may want to do include:
  • adding anchor or attribute tags.
  • editing a custom business entity expression.
  • creating tag associations between the business entity and qualified resources.

Perform the following steps to edit a business entity:

Procedure

  1. Navigate to Glossary Manage.

    You can edit the business entity or the tags that it includes.
  2. Locate and select the business entity or tag that you want to edit.

    NoteRemember that the icon for a business entity is different from the icon for a regular tag hierarchy. Refer to Business entities for more information.
  3. (Optional) Click the Settings tab if you want to make changes to settings. When you are finished, click Save to save your changes.

  4. (Optional) Click the Entities tab if you want to view or edit the expression.

    You can click Copy expression to copy the default expression as a starting point for a custom expression.
    NoteIt is a best practice to copy the expression and edit it in another application rather than editing the expression on the Entities tab, so you do not lose the current expression.
    When you are finished editing the expression, you can paste it over the current expression. The expression is not validated in the user interface, but validation is built into the tag discovery process, and any errors are recorded in the tag discovery log.
  5. Click Save to save the business entity.

  6. (Optional) You can select the Creates tag associations check box on the Entities tab to create a tag association between the business entity and qualified resources for which business entity expressions were positively evaluated during tag discovery.

  7. Run tag discovery to curate the results.

Discovering business entities

After you define a business entity with anchors, you should run tag discovery.

When a business entity attribute tag uses a reference tag (seeded or regular expression), tag discovery makes suggestions only for the referred tag, not the reference tag. Also, Lumada Data Catalog makes tag association suggestions only for resources that the business entity expression has evaluated as true or determined to be within the context of the business entity.

For example, you may have a PersonalData business entity attribute tag PersonalData.PII.LastName that uses Built-in_Tags.Last_Name as a reference tag, and the business entity expression is hasTag(PersonalData.PII.SSN)=1 AND hasTag(PersonalData.PII.FirstName)=1.

When you run tag discovery, the suggested associations are only for the tag PersonalData.PII.LastName and not for the reference tag Built-in_Tags.Last_Name.

These suggestions are made only for resources that satisfy the condition of the business entity expression and have both the PersonalData.PII.SSN and PersonalData.PII.FirstName tags.

This evaluation process is how the business entities feature reduces false positives by filtering context.

For non-business entity-member tags (context free tags) using reference tags, tag discovery makes suggestions for both the context tags (business entity members) as well as context-free tags (tags that are not business entity members).