Tagging resources and fields
If your user profile is granted the Associate Business Terms permission, you can use Data Catalog to identify data by associating a business term with a specific folder, file, table, or field. You can associate any number of terms with an item. After a term is used to mark an item, you can use the term name to search for the item. You can also select terms to help you find items associated with those terms.
Contact your Data Catalog administrator for access to the following permissions:
- To view business terms, your user profile requires the View Business Terms permission.
- To create new terms, your user profile requires the Manage Business Terms permission.
Term association confidence cutoff for fields
When Data Catalog suggests a term association for a field, it assigns the association a score or weight during term propagation. The weight is calculated as a confidence cutoff percentage, with higher scores indicating a closer match to the criteria used to propagate the term. A confidence cutoff of 100% is a strong match to the association criteria. The confidence cutoff calculation depends on following dimensions:
- Overlapping values
- Overlapping tokens (individual words)
- Overlapping patterns
- Matching field names
- Other matching terms
- Matching numeric range
- Matches on numeric properties
- Quantile, standard deviation, cardinality, boundaries, and mean
Each dimension contributes differing amounts to the overall weight. The overall weight is calculated to emphasize high-quality matches against field data and to reduce low-quality matches. Terms propagate on some data better than for other data. For example, tagging free text fields such as social media messages do not propagate well. Tagging product descriptions or other standardized text or codes propagate more smoothly.
Managing terms
The following sections help you to manage tagging by identifying the correct business term to use for a data type. The content also shows you how to accept or reject suggested term associations and how to tune term properties for efficient term propagation and association.
View existing terms and term associations
The Business Glossary shows all glossaries and terms for which you have access based on your role.
To see the associations that contribute to discovery or seed terms and rejected term associations, select the term and look at the Key Metrics card on the Summary tab.
The counts under Associated Data Elements for a term indicate its number of suggested, accepted, rejected, and seeded term associations.
Search for terms using Advanced Search
Perform the following steps to search for terms using Advanced Search.
Procedure
Click in the search box and then click Advanced Search.
If the search box is not visible, click Search in the left menu bar.The Advanced Search page opens.Select the Entity type that you want to search.
- Resources to search resources.
- Fields to search fields.
(Optional) Click in the Including Term(s) field and enter a term or select one from the drop-down list that displays. You can enter multiple terms. Select the Include child terms check box if you want to include child terms in your search.
The selected terms appear in the Including Terms(s) field.(Optional) Click in the Excluding Term(s) field and enter a term or select one from the drop-down list that displays. You can enter multiple terms. Select the Exclude child terms check box if you want to exclude child terms from your search.
NoteIf the Including Term(s) and Excluding Term(s) fields contradict each other, then Excluding Term(s) takes precedence.Selected excluded terms appear on the Advanced Search Form page.(Optional) Apply facets as needed, such as specifying a resource type or data type.
Click Apply filters and search.
A list of resource terms or field terms matching your search criteria is returned. The results for a single term searches are displayed by decreasing relevance level by default, but you can filter them in the following ways: decreasing or increasing relevance, ascending or descending name order, and decreasing or increasing average rating.
Search for terms using Business Glossary
Procedure
Click Glossary in the left navigation menu.
The Business Glossary page opens.On the Business Glossary page, enter your search in the Find box and press Enter.
Results
Search for terms using global search
Procedure
On the Home page, enter a term to search for in the Search data catalog box and press Enter or click Search.
The search results appear.Click the name of the term to select it.
Results
Searching in nested terms
While using the Business Glossary page, you can refine your search within nested terms. If a glossary contains a nested term, a Search icon (magnifying glass) appears to the right of the nested term displayed in the Business Glossary page.
To confine your search to that nested term, click the Search icon and enter a search term or phrase into the Search Glossary box.
Tagging a resource
You can associate an existing term with a folder, file, or table. If your user profile is configured with the Manage Business Terms permission, you can create a new term to associate with a resource. You need to have permission to access the glossary in which you want to add the term.
You can tag folders, member resources, files, views, or tables from the browser or from the search results view as described in the following sections:
Tag a member resource
Procedure
From the browse or search results view, locate then select the member resource that you want to tag.
Click the More actions icon and select Add term from the drop-down menu.
Optionally, from the field view level for a collection, you can select +Add Term Assocation to open the Add a Term dialog box.In the Add a Term dialog box, select the action used to add the term.
Add a Term Adds an existing term to the member resource. Create a new term Adds a new term to the member resource. Enter the term name in the Term name field.
If you choose to create a new term, enter the term name in the New term name field, and optionally a term description in the Term description field.NoteTerm names can be up to 256 characters long and can contain any character, except the dot (.) as it is used to denote term hierarchy. Term descriptions can have up to 512 characters.Click Add.
The member resource is tagged.
Tag a file or table
Procedure
From the browse or search results view, locate then select the file or table that you want to tag.
On the report.csv tab in the field-level view, click the More actions icon and select Add term from the drop-down menu.
The Add a Term dialog box opens.In the Add a Term dialog box, select the action used to add the term.
Add a Term Adds an existing term to the folder. Create a new term Adds a new term to the folder. Enter the term name in the Term name field.
If you choose to create a new term, enter the term name in the New term name field, and optionally a term description in the Term description field.NoteTerm names can be up to 256 characters long and can contain any character, except the dot (.) as it is used to denote term hierarchy. Term descriptions can have up to 512 characters.Click Add.
The file or table is tagged.
Tag a field
Procedure
From the browse or search results view, locate then select the folder that you want to tag.
In the Field Properties pane, click +Add Term Association for the selected resource.
The Add a Term dialog box opens.In the Add a Term dialog box, select the action used to add the term.
Add a Term Adds an existing term to the folder. Create a new term Adds a new term to the folder. Enter the term name in the Term name field.
If you choose to create a new term, enter the term name in the New term name field, and optionally a term description in the Term description field.NoteTerm names can be up to 256 characters long and can contain any character, except the dot (.) as it is used to denote term hierarchy. Term descriptions can have up to 512 characters.Click Add.
The field is tagged.
Tag unstructured data
You can use a business term to tag your unstructured data using a regular expression.
Perform the following steps to tag a term in unstructured data using a regular expression:
Procedure
Click Glossary in the left navigation menu.
The Business Glossary page opens.Select the term you want to tag in the glossary list or click Add New to add a new tag.
Click the Settings tab.
Click the Identify Term By field and select Regular Expression.
The Regular Expression field displays.Enter the regular expression you want to use in your search in the Regular Expression field.
Do not use a^
for “starts with” or$
for “ends with” in the regular expression or Data Catalog will fail to find most of the mentions of the term.NoteThe regular expression you enter in the Regular Expression field must adhere to the regular expression logic enforced by the Java regular expression engine.In the Test Data field, enter data that matches the regular expression and click Test.
Scroll down to Discovery Setting and Asset Type and select the Unstructured check box.
NoteIf you do not select the Unstructured check box, Data Catalog will not run the job for the term.You can also select the Structured check box if you would like to find the term in structured data.
(Optional) Under Scan Documents, you can select Stop after [ ] matches and specify a number to stop the scan after a specified number of regular expression-based matches.
Click Apply Changes.
Open the Data Canvas and run the Data Profiling Combo job on your data, or the Data Profiling job if the Format Discovery and Schema Discovery jobs already ran on the data.
Results
- The Unstructured check box is selected for the data.
- A Data Profiling or Data Profiling Combo job ran for the data specified.
- The regular expression adheres to Java regular expression logic.
Tagging collections
Collections are a set of files with a similar schema and format. When files are grouped as a collection, you can manage the terms for that set of files from the collection as the single representation of all the data in all the files.
For terms assigned to individual files before they become part of the collection, do the following:
- Add the accepted term associations found in files to the collection as suggested terms (unless they are already part of the collection as accepted terms).
- Treat as rejected from the collection any term associations that were rejected.
When a file is part of a collection, Lumada Data Catalog no longer suggests term associations for the individual collection members. However, any existing accepted terms in the collection members continue to be considered in term propagation for that term. You should manage terms for all files from the collection rather than make new term associations in the individual files.
Tag a collection
Procedure
Navigate to the top level folder of the collection you want to tag and click the arrow to expand the folder.
All the fields will be shown as an ordered list.Select any field, and on the Summary tab, click the Add Term link.
In the Add a Term dialog box, select Create a new term.
Fill in the fields, including New term name, then click Add.
The collection is now tagged with the name you entered.
Accepting or rejecting suggested term associations
You can accept or reject suggested term associations in Lumada Data Catalog.
- Double-click a suggested term association in a resource field or in field-level search results to accept it.
- Single click to open the Association window where you can select the More actions icon to display the drop-down menu to accept or reject the term association.
Accept a term association
Procedure
Navigate to a resource field or perform a field-level search for a term.
Click the Glossary tab.
Click the More Actions menu at the end of the row for the business term, and select Accept Association from the drop-down menu.
Results
Reject a term association
Procedure
Navigate to a resource field or perform a field-level search for a term.
Click the Glossary tab.
Click the More Actions menu at the end of the row for the business term and select Reject Association from the drop-down menu.
Results
Remove a term association
Procedure
Navigate to a resource field or perform a field-level search for a term.
Click the Glossary tab.
Click the More Actions menu at the end of the row for the business term and select Remove Association from the drop-down menu.
Results
Change the data used as the seed for term discovery
Terms for some data propagate more precisely than other data. For example, if you tag a field that contains a product code made up of letters, numbers, and punctuation, such as “CSP-2201A”, Data Catalog can precisely identify other fields with similarly constructed data. However, if you tag a field that contains free text (such as the text field in a social media feed) or numeric values (such as rainfall depth values), Data Catalog may find false positives when attempting to match the data. Consider defining a business rule or regular expression for a term when you want to tag data with a specific text pattern.
Follow the steps below to add or remove a term association as a seed:
Procedure
Click Glossary in the left navigation menu and use the left navigation tree to locate the term.
On the Summary tab, icons in the Key Metrics card indicate the suggested, accepted, rejected and seed associations.On the Summary tab, click View All on the Business Terms card.
Data that is tagged as seed data is marked as Enabled in the Seed column.Evaluate the seed term associations to make sure they form a representative set of data to use for tagging.
Stop a seed term from participating in term discovery by clicking the Actions icon at the end of the business term row and selecting Do not use as seed.
Add a term association as a seed by opening the term association and selecting Use field data in Term discovery.
Next steps
Change the confidence cutoff for a term
If a field matches with a score below the threshold value or confidence cutoff, then the term association does not appear in the catalog.
Perform the following steps to change the confidence cutoff for a term.Procedure
Click Glossary in the left navigation menu and select the term for which you want to change the confidence cutoff.
In the Settings tab, scroll down to the Confidence Lower Limit and Confidence Upper Limit fields.
Set the value to the score that Data Catalog will use as the minimum and maximum confidence or cutoff for term association suggestions.
Click Apply Changes to save.
Next steps
If you want to clear the existing suggestions and start from scratch, rerun the business term discovery job with the Incremental Profiling check box unselected.
Create a value term
Procedure
Click Glossary in the left navigation menu.
The Business Glossary page opens.Click Add New and select Term.
The Create Business Term dialog box displays.Enter a name for the term.
- If you did not already select a glossary, click the arrow in the Glossary field and select the glossary you want to contain the term.
- Select the parent term if desired.
Click Create.
Next steps
- As a best practice, you should add a term description after the term is created.
- Run the Business Term Discovery job, which can propagate and create term suggestions.
Create a regular expression term
By default, terms are created as Value terms. Use this procedure to change an existing term to a Regular expression term. Term associations will be suggested based on the number of values in the field that match the regular expression.
When creating your glossaries and terms, the built-in terms may provide good examples of what you can do using regular expressions and length limits. All terms configured with regular expression rules are listed in the Rules tab of the Data Canvas view of the term. You can review the regular expressions for built-in terms, and you can disable built-in terms from propagating. However, you cannot edit the regular expression for a built-in term.
Perform the following steps to create a regular expression term.
Procedure
Click Glossary in the left navigation menu and select the term you want to modify.
On the Details tab, select Regular Expression from the drop-down list in the Identify Term By field.
Enter the regular expression in the Regular Expression field.
Enter test data to validate that the regular expression matches the data as you expect.
In the Min Length and Max Length fields, enter the minimum and maximum number of characters against which Data Catalog should apply this expression.
These values help Data Catalog optimize processing so that it doesn't spend time on data that is not likely to match the regular expression.In the Min Confidence field, set the minimum threshold value for the term.
This is a threshold value to indicate how many of the field's values need to match the pattern for the term to be associated with the field. For example, if you expect each value in a field to match, set the cutoff at 90% or higher. If you want Data Catalog to suggest a term if the field has any values that match the regular expression, set the Min Confidence percentage very low.Click Apply Changes.
Next steps
Controlling automated discovery and learning
You can stop term association changes from impacting the algorithm for term discovery in Data Catalog. When learning is turned off, accepting and rejecting term associations no longer has an impact on how value terms are evaluated. Consider turning off learning when you are satisfied that terms are added to new data appropriately.
Turn automated discovery and learning off or on
Procedure
Click Glossary in the left navigation menu and select the term you want to manage.
The business term summary is displayed.Click the Details tab to view the settings for Automated Discovery and Keep Learning.
Click a setting to change it.
- Switch the Automated Discovery option to the right to allow Data Catalog to suggest term associations, or switch it to the left to turn it off.
- Switch the Keep Learning option to the right to improve automated tagging with analysis, or switch it to the left to turn it off.
Click Apply Changes to save the updated settings or click Cancel to discard your changes.
Delete a term
Use the following steps to delete a term:
Procedure
Click Glossary in the left navigation menu and select the term that you want to delete.
Click Actions under the left navigation tree and then select Remove.
A confirmation box appears.In the Please Confirm field, type
yes
and click Confirm.
Built-in terms
In addition to terms that you can add, Lumada Data Catalog has a set of predefined terms. These terms fall into two categories:
Regular expressions
Data, such as United States ZIP codes and phone numbers, are tagged by matching data with regular expressions for these values.
Reference data
Field data, such as countries, the names of US states, and first and last names, are tagged by matching the signature of known data. This reference data is static. You cannot include data from seed fields to alter the term algorithm of the built-in terms.
Built-in terms are predefined and propagated when you run a term job after collecting discovery metadata for catalog resources. Built-in terms cannot be changed. If you do not want to use the built-in terms provided for tagging your data, you can turn off automatic term propagation for these terms. See Turn automated discovery and learning off or on for details.
Suggested terms are associated with fields that match the reference data and patterns such as the data in the following table.
Suggested terms | |
Countries: full name | Salutation |
Countries: 3-letter abbreviation | US Address |
Email address | US City |
First Name | US County |
Global City | US Phone Number |
IP Address | US Social Security Number, Numeric |
Last Name | US Social Security Number, Delimited |
Major Credit Card Number | US State Abbreviation |
Occupation | US States |
People Names | US ZIP Code: NNNNN and NNNNN-NNNN |
Regex use case: National Identifiers
Follow the steps below to discover fields using regex terms.
Procedure
Create a glossary named National_Identifiers.
Create regex terms for the national identifier and/or passport that needs to be discovered by defining the valid regex.
Run term discovery on the data.
Results