Skip to main content

Pentaho+ documentation is moving!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Business entities

Parent article

During discovery, business entities help narrow the search for matching data and apply data context filtering. The creation of a business entity can help term discovery make more meaningful suggestions by fine-tuning the accuracy of the discovery process.

A business entity is a term hierarchy that provides more context for the data, which can help eliminate false positives in search results. With business entities, term discovery can filter out non-compliant term-field associations and create term-resource associations. For example, a business entity can contain three fields that, when appearing together, help to define a customer. When the fields appear by themselves there is no meaning, but when they appear together, you can derive meaning.

In the Data Catalog user interface, you can see that a term hierarchy is a business entity if the following icon appears for the term: Business entities icon

For example, you may have an Age term that identifies the age of clients for an insurance claims case. Since Age is numeric data, term discovery also lists Age term suggestions for resources with fields that contain a quantity of claims processed, or the resources with fields that represent a numeric value for an internal processing code.

You can add context and meaning to the Age term by making it part of a business entity. You can add context by requiring that, to suggest the Age term on a field, the resource also needs to have the Person_name field term, or the Claim_number field. Once you require this context, in the next round of full term discovery evaluation, you may see a sharp reduction in the number of false positives. Now the Age term is suggested only on the claim resources with the expected context.

Business entities are defined manually using strong anchor terms defined by Data Stewards. These anchor terms need to be terms that can be discovered on their own.

Note
  • The creation of business entities and their use in fine-tuning discovery are an activity for advanced Data Catalog users whose job it is to verify information is tagged correctly.
  • Business Entities do not support unstructured files.

Anchor and attribute terms

Only parent terms can be converted to act as business entities. The child terms of a business entity are called business entity members. Business entity members or terms can be one of two types: anchor or attribute.

  • Anchor terms

    Anchor terms are context independent terms that can be discovered on their own and need to be explicitly identified. They are used as conditions in the rule that determines the data that matches the business entity definition.

    In the insurance claims example, the Person_name term and Claim_number term are anchor terms that can be discovered independently without other context, and they set the context for the Age business entity term.

    After you have created a business entity, to set one of its terms as an anchor, select the Anchor check box on the Entities tab of the term. You can manually associate the anchor with resource fields to force the context of the resource.Anchor terms are indicated with an anchor icon: Anchor icon

  • Attribute terms

    Any business entity member that is not an anchor is an attribute. Attribute terms are context dependent: if the anchor term conditions are satisfied, then the system makes associations for attribute terms only within the context of the business entity. Any business entity child term which is not an anchor becomes an attribute term.

Business entity expressions

A business entity expression determines whether a data resource is a business entity. If a data resource is evaluated as being part of a business entity, you can associate its fields with attribute terms of the business entity.

The context applied by anchor terms is formulated in the form of a Boolean expression with a predicate such as:

hasTag(<GlossaryName>/<FullTermNamePart1>.<FullTermNamePart2>) = <1/0>

Where the hasTag() function semantics are:

  • If the term is in context, return 1
  • If the term is not in context, return 0

If any resource field has an association with term T, then term T is in context.

When entering the term expression, you must quote the domain and term names with spaces within back quotation characters (`).

The default expression provided is an OR expression, but you can enter a custom expression by clicking Custom.

The business entity expression allows the operators and separators AND, OR, NOT, parentheses, and spaces to be enclosed in back quote characters.

The following is a sample expression:

hasTag(`ADDRESSES/AddressUs.AddressLine`)=1 and (hasTag(`ADDRESSES/AddressUs.FullName`)=1 or
        (hasTag(`ADDRESSES/AddressUs.LastName`)=1 and hasTag(`ADDRESSES/AddressUs.FirstName`)=1)) and
        hasTag(`ADDRESSES/AddressUs.StateAbbr`)=1

If you select the Creates term associations checkbox on the Entities tab for a term, resource-field association suggestions are made automatically the next time discovery is run.

NoteData Catalog does not validate business entity expressions in the user interface or check for cyclical references. Data Catalog does some validation during term discovery, but Data Stewards should perform their own validations on business entity expressions.

After you make any changes to the business entity expression, you must run a term discovery job without the incremental flag selected, or with the option -incremental false. Incremental processing should be turned off because business entity discovery does not support incremental updates.

Create a business entity

Before you begin

To create business entities, you need to be assigned the following role permissions:
  • View Business Glossaries permission
  • Access to the relevant glossary
  • Manage Business Terms permission
Perform the following steps to create a business entity:

Procedure

  1. Create a term hierarchy around a desired context, such as bill of materials, credit card information, or PII information.

    If there is not a child term, create a child term for the existing term. The Business Entity Disabled/Enabled check box will not display on the Status card unless the term has a child term.
  2. Select the parent term in the hierarchy, and on the Summary tab, select the Disabled check box next to Business Entity to enable the business entity.

    After enabling the business entity, the parent term icon changes to the business entity icon.
  3. Choose one or more child terms in this hierarchy to act as context-independent anchor terms for the business entity. These anchor terms can be seed terms, regular expression terms, or reference terms to other seed or regular expression terms. Perform the following steps to set a child term as an anchor term:

    1. Select the term you want to set as an anchor term.

    2. On the Status card in the Summary tab, select the Disabled check box next to the Anchor field.

    The anchor is enabled.
  4. (Optional) To specify a term as a reference term, use the following steps:

    1. Select the term and on the Summary tab, scroll down to the Item Properties card.

    2. On the Item Properties card, click Add next to Reference Term.

      A Business Terms dialog box opens.
    3. Expand the glossaries shown in the Business Terms dialog box and locate the term you want to use as a reference term.

    4. Select the check box for the term and click Select.

      The reference term is set.

    Glossaries and terms displayed in the Business Terms dialog box depend on the glossaries accessible to your user role.

  5. Repeat the previous step for any other anchor terms.

  6. (Optional) Customize the business entity expression on the Details tab of the parent term of the business entity. By default, Data Catalog defines the business entity expression as an OR function of the member anchor terms defined.

    You can define expressions for business entities based on rules described in Business entity expressions.

    For example, if a custom expression is defined as hasTag(PersonalData.PII.SSN)=1 AND hasTag(PersonalData.PII.FirstName)=1, then attribute terms are suggested for resources that also have both PersonalData.PII.SSN and PersonalData.PII.FirstName terms present in that resource.

    You can click the copy icon to copy the default expression as a starting point for the custom expression.
    NoteIt is a best practice to copy the expression and edit it in another application, rather than editing the expression in the Business Entity Expression field so you do not lose the current expression.
    When you are finished editing the expression, you can paste it over the current expression. The expression is not validated in the user interface, but validation is built into the term discovery process, and any errors are recorded in the term discovery log.
  7. (Optional) You can click the Details tab and select the Create resource associations check box to request that Data Catalog create term associations between the business entity and qualified resources for which business entity expressions are positively evaluated during business term discovery.

    These business entity terms are discovered resource terms and will have a weight confidence of 100%. Unlike regular resource terms, business entity terms are the only resource terms that can be suggested and always have a weight confidence of 100%.
  8. Click Save to save the business entity.

  9. Run a business term discovery job to curate the results. On the Business Term Discovery job processing page, leave the Incremental Profiling check box unselected, or specify the option -incremental false in the Enter Parameters field.

Edit a business entity

To fine tune data discovery, you may want to edit a business entity. Some of the tasks you may want to do include:
  • adding anchor or attribute terms.
  • editing a custom business entity expression.
  • creating term associations between the business entity and qualified resources.

Perform the following steps to edit a business entity. You can edit the business entity or the terms that it includes.

Procedure

  1. Click Glossary in the left navigation menu.

    The Business Glossary page opens.
  2. Locate and select the business entity or term that you want to edit. Refer to Business entities for more information.

  3. (Optional) Click the Details tab if you want to view or edit the expression.

    You can click the copy icon next to the Business Entity Expression field to copy the default expression as a starting point for a custom expression.
    NoteIt is a best practice to copy the expression and edit it in another application rather than editing the expression in the Business Entity Expression field, so you do not lose the current expression.
    When you are finished editing the expression, you can paste it over the current expression. The expression is not validated in the user interface, but validation is built into the business term discovery process, and any errors are recorded in the term discovery log.
  4. Click Save to save the business entity.

  5. (Optional) You can click the Details tab and select the Creates term associations check box to create term associations between the business entity and qualified resources for which business entity expressions are positively evaluated during business term discovery.

  6. Run a business term discovery job to curate the results. On the Business Term Discovery job processing page, leave the Incremental Profiling check box unselected, or specify the option -incremental false in the Enter Parameters field.

Discovering business entities

After you define a business entity with anchors, you should run business term discovery.

When a business entity attribute term uses a reference term (seeded or regular expression), term discovery makes suggestions only for the referred term, not the reference term. Also, Lumada Data Catalog makes term association suggestions only for resources that the business entity expression has evaluated as true or determined to be within the context of the business entity.

For example, you may have a PersonalData business entity attribute term PersonalData.PII.LastName that uses Built-in_Terms.Last_Name as a reference term, and the business entity expression is hasTag(PersonalData.PII.SSN)=1 AND hasTag(PersonalData.PII.FirstName)=1.

When you run term discovery, the suggested associations are only for the term PersonalData.PII.LastName and not for the reference term Built-in_Terms.Last_Name.

These suggestions are made only for resources that satisfy the condition of the business entity expression and have both the PersonalData.PII.SSN and PersonalData.PII.FirstName terms.

This evaluation process is how the business entities feature reduces false positives by filtering context.

For non-business entity-member terms (context free terms) using reference terms, term discovery makes suggestions for both the context terms (business entity members) as well as context-free terms (terms that are not business entity members).