Skip to main content

Pentaho+ documentation is moving!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Tagging resources and fields

Parent article

If your user profile is configured for the Analyst role, you can use Lumada Data Catalog to identify data in the cluster by associating a tag with a specific folder, file, table, or field. You can associate any number of tags with an item. After a tag is used to mark an item, you can use the tag name and text in the tag description to search for the item. You can also select tags to help you find items associated with those tags.

To have access to tagging, your user profile needs to be configured with at least the Analyst role. To create new tags, your user profile needs to be configured with the Data Steward role for one or more tag domains. Contact your Data Catalog administrator for access privileges.

Tag association confidence cutoff for fields

When Lumada Data Catalog suggests a tag association for a field, it assigns the association a score or weight during tag propagation. The weight is calculated as a confidence cutoff percentage, with higher scores indicating a closer match to the criteria used to propagate the tag. A confidence cutoff of 100% is a strong match to the association criteria. The confidence cutoff calculation depends on following dimensions:

  • Overlapping values
  • Overlapping tokens (individual words)
  • Overlapping patterns
  • Matching field names
  • Other matching tags
  • Matching numeric range
  • Matches on numeric properties
  • Quantile, standard deviation, cardinality, boundaries, and mean

Each dimension contributes differing amounts to the overall weight. The overall weight is calculated to emphasize high-quality matches against field data and to reduce low-quality matches. Tags propagate on some data better than for other data. For example, tagging free text fields such as social media messages do not propagate well. Tagging product descriptions or other standardized text or codes propagate more smoothly. The default tag association confidence cutoff is set to 40% for low, 60% for normal, and 80% for high. These defaults can be configured in conf/configuration.json.

How to manage tags

The following sections help you to manage tagging by identifying the correct tag to use for a data type. The content also shows you how to accept or reject suggested tag associations and how to tune tag properties for efficient tag propagation and association.

View existing tags, tag associations

The Glossary shows all tag domains and tags for which you have access based on your role.

Tags in the Glossary

As shown in the image, tags include:

  • Employee
  • Sensitive
  • category-code
  • emd_id
  • experience
  • f-name
  • grad-year
  • pin#
  • skill_id
  • start-year
  • tax-code

To see the associations that contribute to discovery or seed tags and the rejected tag associations, select the tag and click Reference / Rejected.

The counts next to the tag indicate the number of seeds, suggested, and rejected tag associations.

Tag association indications

Search for tags using Advanced Search

You can perform a search for tags using Advanced Search. This search can be helpful when you are tuning tag properties for efficient tag propagation.

AttentionThe Including/Excluding tag(s) fields are available in version 6.1 and later.
Perform the following steps to search for tags using Advanced Search.

Procedure

  1. Click in the Search data catalog field, and then click Go to Advanced Search.

    Advanced Search in Resources

    The Advanced Search Form page opens.
  2. Choose the Entity type that you want to search:

    • Resources: to search resources.
    • Fields: to search fields.
  3. Enter a tag or tags in the Including Tag(s) field from the dropdown menu, and select the Include child tags check box if you want to include child tags in your search.

    Selected included tags appear on the Advanced Search Form page.
  4. (Optional) Enter a tag or tags in the Excluding Tag(s) field from the dropdown menu, and select the Exclude child tags check box if you want to exclude child tags from your search.

    NoteIf the Including Tag(s) and Excluding Tag(s) fields contradict each other, then Excluding Tag(s) takes precedence.
    Selected excluded tags appear on the Advanced Search Form page.
  5. Click Apply filters and search.

    A list of resource tags or field tags matching your search criteria is returned. For example, the results for a single tag searches are displayed by decreasing the confidence level by default, which can also be filtered by increasing confidence level, decreasing/increasing relevance, ascending/descending name order, and decreasing/increasing average rating.:

    Field tag sort filter

    NoteYou cannot switch between the Resources tab and the Fields tab on the search results page. You must perform a separate query to display the required results. Depending on your permissions some data may be unavailable for viewing.

Search for tags using Browse Glossary

Perform the following steps to search for tags in the catalog using Browse Glossary.

Catalog browse search

Procedure

  1. On the Home page, select Glossary Browse.

  2. On the Browse Glossary page, enter your search in the Search Glossary box and press Enter.

  3. Select a tag from the tag hierarchy.

    The search returns resources with the selected tag and any child tags.

Search for tags using Manage Glossary

Perform the following steps to search for tags in the Manage Glossary page:

Procedure

  1. On the Home page, select Glossary Manage .

    The Manage Glossary page opens.
  2. Enter your search term or phrase into the Search Glossary box and press Enter.

    The search results appear.
  3. Click the name of the tag to select it.

Results

The tag's domain appears with the selected tag highlighted. From here, you can manage the tags.

Searching in nested tags

AttentionThis feature is available in version 6.1 and later.

While using the Manage Glossary page, you can refine your search within nested tags. If a tag domain contains a nested tag, a Search icon (magnifying glass) appears to the right of the nested tag displayed in the Manage Glossary page.

To confine your search to that nested tag, click the Search icon and enter a search term or phrase into the Search Glossary box.

Tagging a resource

You can associate an existing tag with a folder, file, or table. If your user profile is configured with the Data Steward role for a tag domain, you can create a new tag in that domain to associate with a resource.

NoteUnlike field tags, resource tags do not participate in tag propagation.

You can tag folders, datasets, member resources, files, or tables from the browser or from the search results view as described in the following sections:

Tag a folder

Perform the following steps to tag a folder.

Procedure

  1. From the browse or search results view, locate then select the folder that you want to tag.

  2. Click the More actions icon and select Add tag from the drop-down menu.

    The Add a Tag dialog box opens.

    Tag a folder

  3. In the Add a Tag dialog box, select the action used to add the tag.

    Add a TagAdds an existing tag to the folder.
    Create a new tagAdds a new tag to the folder.
  4. Enter the tag name in the Tag name field.

    If you chose to create a new tag, enter the tag name in the New tag name field, and optionally a tag description in the Tag description field.

    NoteTag names can be up to 256 characters long and can contain any character, except the dot (.) as it is used to denote tag hierarchy. Tag descriptions can have up to 512 characters.
  5. Click Add.

    The folder is tagged.

Tag a dataset

Perform the following steps to tag a dataset.

Tag a dataset

Procedure

  1. From the browse or search results view, locate then select the dataset that you want to tag.

  2. Click the More actions icon and select Add tag from the drop-down menu.

    Optionally, from the field view level for a collection, you can select +Add Tag Association to open the Add a Tag dialog box.
  3. In the Add a Tag dialog box, select the action used to add the tag.

    Add a TagAdds an existing tag to the dataset.
    Create a new tagAdds a new tag to the dataset.
  4. Enter the tag name in the Tag name field.

    If you choose to create a new tag, enter the tag name in the New tag name field, and optionally a tag description in the Tag description field.

    NoteTag names can be up to 256 characters long and can contain any character, except the dot (.) as it is used to denote tag hierarchy. Tag descriptions can have up to 512 characters.
  5. Click Add.

    The dataset is tagged.

Tag a member resource

Perform the following steps to tag a member resource.
NoteCollection members that are part of the datasets cannot be tagged.

Tagging a dataset member

Procedure

  1. From the browse or search results view, locate then select the member resource that you want to tag.

  2. Click the More actions icon and select Add tag from the drop-down menu.

    Optionally, from the field view level for a collection, you can select +Add Tag Assocation to open the Add a Tag dialog box.
  3. In the Add a Tag dialog box, select the action used to add the tag.

    Add a TagAdds an existing tag to the member resource.
    Create a new tagAdds a new tag to the member resource.
  4. Enter the tag name in the Tag name field.

    If you choose to create a new tag, enter the tag name in the New tag name field, and optionally a tag description in the Tag description field.

    NoteTag names can be up to 256 characters long and can contain any character, except the dot (.) as it is used to denote tag hierarchy. Tag descriptions can have up to 512 characters.
  5. Click Add.

    The member resource is tagged.

Tag a file or table

Perform the following steps to tag a file or table.

Procedure

  1. From the browse or search results view, locate then select the file or table that you want to tag.

  2. On the report.csv tab in the field-level view, click the More actions icon and select Add tag from the drop-down menu.

    The Add a Tag dialog box opens.

    Tag a file or table

  3. In the Add a Tag dialog box, select the action used to add the tag.

    Add a TagAdds an existing tag to the folder.
    Create a new tagAdds a new tag to the folder.
  4. Enter the tag name in the Tag name field.

    If you choose to create a new tag, enter the tag name in the New tag name field, and optionally a tag description in the Tag description field.

    NoteTag names can be up to 256 characters long and can contain any character, except the dot (.) as it is used to denote tag hierarchy. Tag descriptions can have up to 512 characters.
  5. Click Add.

    The file or table is tagged.

Tag a field

Perform the following steps to tag a field in the Resource Detail view.

Procedure

  1. From the browse or search results view, locate then select the folder that you want to tag.

  2. In the Field Properties pane, click +Add Tag Association for the selected resource.

    The Add a Tag dialog box opens.

    Tag a field

  3. In the Add a Tag dialog box, select the action used to add the tag.

    Add a TagAdds an existing tag to the folder.
    Create a new tagAdds a new tag to the folder.
  4. Enter the tag name in the Tag name field.

    If you choose to create a new tag, enter the tag name in the New tag name field, and optionally a tag description in the Tag description field.

    NoteTag names can be up to 256 characters long and can contain any character, except the dot (.) as it is used to denote tag hierarchy. Tag descriptions can have up to 512 characters.
  5. Click Add.

    The field is tagged.

Tagging collections

Collections are a set of files with a similar schema and format. When files are grouped as a collection, you can manage the tags for that set of files from the collection as the single representation of all the data in all the files.

For tags assigned to individual files before they become part of the collection, do the following:

  • Add the accepted tag associations found in files to the collection as suggested tags (unless they are already part of the collection as accepted tags).
  • Treat as rejected from the collection any tag associations that were rejected.

When a file is part of a collection, Lumada Data Catalog no longer suggests tag associations for the individual collection members. However, any existing accepted tags in the collection members continue to be considered in tag propagation for that tag. You should manage tags for all files from the collection rather than make new tag associations in the individual files.

NoteWhile collections can be tagged at resource level and at field level, the member resources cannot be tagged once they have been identified as collection members, and any existing tag associations are aggregated to the collection root.

Tag a collection

Perform the following steps to tag a collection.

Tag a collection

Procedure

  1. Navigate to the resource list view for the collection you want to tag.

  2. Click More actions in the upper-right banner and select Add tag from the drop-down menu that displays.

    Optionally, from the field view level for a collection, you can select +Add Tag Assocation to open the Add a Tag dialog box.
  3. In the Add a Tag dialog box, select Create a new tag.

  4. Fill in the fields, including New tag name, then click Add.

    The collection is now tagged with the name you entered.

Accepting or rejecting suggested tag associations

You can accept or reject suggested tag associations in Lumada Data Catalog.

  • Double-click a suggested tag association in a resource field or in field-level search results to accept it.
  • Single click to open the Association window where you can select the More actions icon to display the drop-down menu to accept or reject the tag association.

Accept a tag association

Perform the following steps to accept a tag association.

Procedure

  1. Navigate to a resource field or perform a field-level search for a tag.

  2. Select the tag association in the Tags column.

    The Association window opens.
  3. Click the More actions icon and select Accept this Suggestion from the drop-down menu.

  4. Click Save.

    The tag association is accepted.

Reject a tag association

Perform the following steps to reject a tag association.

Procedure

  1. Navigate to a resource field or perform a field-level search for a tag.

  2. Select the tag association in the Tags column.

    The Association window opens.
  3. Click the More actions icon and select Reject this Suggestion from the drop-down menu.

    Reject a tag association

  4. Click Save.

    The tag association is rejected.

Remove a tag association

Perform the following steps to remove a suggested tag association.
NoteYou can only remove a suggested tag association. You cannot remove a tag association that was created.

Procedure

  1. Navigate to the Glossary, to a resource field, or perform a field-level search for a tag.

  2. Click the tag association you want to remove in the Tags column.

    The Association window opens.
  3. Click the More actions icon and select Delete this association.

    Removing a tag association

  4. Click Save.

    The tag association is removed.

Reverting tag actions

You can restore a tag action to a prior state by using the Undo feature. When you complete a tag action, such as accepting a tag association, a confirmation message appears at the bottom of the screen, giving you the option to click Undo and revert the tag action.

Undo field tag actions

Change the data used as the seed for tag discovery

When a tag uses the Value method of automatic tagging, Lumada Data Catalog suggests tag associations based primarily on how well field values match the values of the fields tagged, accepted, and marked as Use field data in Tag discovery, or the quality of the seed value of the tag association. The first tag associated with a field is automatically used for tag discovery; if you want to include additional fields in the "seed" data and metadata for tag discovery, you can mark that tag association as Use field data in Tag discovery.

Tags for some data propagate more precisely than other data. For example, if you tag a field that contains a product code made up of letters, numbers, and punctuation, such as “CSP-2201A”, Data Catalog can precisely identify other fields with similarly constructed data. However, if you tag a field that contains free text (such as the text field in a social media feed) or numeric values (such as rainfall depth values), Data Catalog may find false positives when attempting to match the data. Consider defining a regular expression for a tag when you want to tag data with a specific text pattern.

Follow the steps below to add or remove a tag association as a seed:

Procedure

  1. Go to Glossary Manage and find the tag.

  2. Open the Associations dialog box for the tag.

    Associations used in tag discovery are indicated by a circular bullet next to the tag name.
  3. Evaluate the seed tag associations to make sure they form a representative set of data to use for tagging.

  4. Stop a reference/seed tag from participating in tag discovery by selecting Don't Use field data in tag discovery in the tag association dialog box.

  5. Add a tag association as a seed by opening the tag association and selecting Use field data in Tag discovery.

Next steps

Rerun tag discovery.

Change the confidence cutoff for a tag

Perform the following steps to change the confidence cutoff for a tag.

Procedure

  1. Go to Glossary Manage and find the tag.

  2. In the Settings tab, scroll down to the CONFIDENCE CUTOFF setting.

  3. Set the value to the score that Data Catalog will use as a cutoff for the tag association suggestions.

    NoteCONFIDENCE CUTOFF cannot be set to 0.
  4. Click Save.

Next steps

Rerun tag discovery with the incremental flag set to false.

Create a value tag and regular expression tag

Perform the following steps to create a value or regular expression tag.

When putting together your own tagging rules, the built-in tags may provide good examples of what you can do using regular expressions and length limits. All tags configured with regular expression rules are listed in the RegEx Tagging Rules tab of the Manage page. You can review the regular expressions for built-in tags, and you can disable built-in tags from propagating; you cannot edit the regular expression for a built-in tag.

Note
  • The first TAG created in a domain is a PARENT TAG by default.
  • The PARENT TAG is a type-ahead field. If you do not specify the parent tag in the tag settings, the tag will be created as parent.

Perform the following steps to create a value or regular expression tag:

Procedure

  1. In Glossary Manage, select the tag domain where you want this tag to be created.

    The tag domain controls which users will see the tag.
  2. Click Create and choose A new tag.

  3. Enter a name for the tag.

    We recommend that you enter a tag description as well.
  4. Click Create.

    By default, a Value tag is created.

    Value tag

  5. Scroll down in the settings to PROPAGATION METHOD and select Regular Expression.

    Tag associations are suggested based on the number of values in the field that match the regular expression.

    Regular expression tag

  6. Enter the regular expression.

  7. Enter test data to validate that the regular expression matches the data as you expect.

  8. Enter the minimum and maximum number of characters that Lumada Data Catalog should apply this expression against.

    These values help Data Catalog optimize processing so that it doesn't spend time on data that is not likely to match the regular expression.
  9. In the CONFIDENCE CUTOFF field, set the minimum threshold value for the tag.

    This is a threshold value to indicate how many of the field's values need to match the pattern for the tag to be associated with the field. For example, if you expect each value in a field to match, set the cutoff at 90% or higher. If you want Data Catalog to suggest a field if it has any values that match the regular expression, set the cutoff very low.
  10. Click Save.

Next steps

Run the tag evaluation and propagation jobs.

Controlling tag propagation and discovery learning

You can stop tag association changes from impacting the algorithm for tag discovery in Lumada Data Catalog. When learning is turned off, accepting and rejecting tag associations no longer has an impact on how value tags are evaluated. Consider turning off learning when you are satisfied that tags are added to new data appropriately.

Turn tag propagation and discovery learning off or on

Follow the steps below control tag propagation and discovery learning.

NoteYou must have a user profile of steward or higher to proceed.

Procedure

  1. Go to Home Glossary Manage.

  2. Select the tag that you want to manage.

    The tag definition is displayed.

    Tag propagation and discovery controls

  3. Select Settings.

  4. In the TAG DISCOVERY AUTOMATION option, select ENABLE to allow automated tag discovery, or select DISABLE to turn it off.

  5. In the LEARNING option, select ON to improve automated tagging by analysis, or select OFF to turn it off.

Deleting tags

If you have a user role of steward or have higher access to the tag domain where the tag resides, you can remove all suggested tag associations for a tag. You can remove tag associations by changing the definition of the tag (not just the description or the name, but part of the discovery attributes) and running tag discovery again with an incremental false.

Alternatively, you can remove all tag associations from a tag by deleting the tag.

To delete the tag, click Glossary Manage and select the tag that you want to delete. Click the More actions icon and then select Delete from the drop-down menu.

Deleting tag

Built-in tags

In addition to tags that you can add, Lumada Data Catalog has a set of predefined tags. These tags fall into two categories:

  • Regular expressions

    Data, such as United States ZIP codes and phone numbers, are tagged by matching data with regular expressions for these values.

  • Reference data

    Field data, such as countries, the names of US states, and first and last names, are tagged by matching the signature of known data. This reference data is static. You cannot include data from seed fields to alter the tag algorithm of the built-in tags.

Built-in tags are predefined and propagated when you run a tag job after collecting discovery metadata for catalog resources. Built-in tags cannot be changed. If you do not want to use the built-in tags provided for tagging your data, you can turn off automatic tag propagation for these tags. See Controlling tag propagation and discovery learning for details.

Suggested tags are associated with fields that match the reference data and patterns such as the data in the following table.

Suggested tags
Countries: full nameSalutation
Countries: 3-letter abbreviationUS Address
Email addressUS City
First NameUS County
Global CityUS Phone Number
IP AddressUS Social Security Number, Numeric
Last NameUS Social Security Number, Delimited
Major Credit Card NumberUS State Abbreviation
Occupation US States
People NamesUS ZIP Code: NNNNN and NNNNN-NNNN

Regex use case: National Identifiers

An excellent use case for regex tags is discovering national identifiers. In this example there is a requirement for identifying and discovering national identifiers like passport or national ID for a country or set of countries. Lumada Data Catalog can help discover such fields with the help of regex tags.

Follow the steps below to discover fields using regex tags.

Procedure

  1. Create a tag domain named National_Identifiers.

  2. Create regex tags for the national identifier and/or passport that needs to be discovered by defining the valid regex.

    Regex tags use case
  3. Run tag discovery on the data.

Results

Data Catalog's tag discovery now identifies the national identifier fields and suggests the tag associations.