Searching Data Catalog
You can run a keyword search across the metadata in Lumada Data Catalog in three ways:
Basic Search
You can use the search box on the main toolbar to trigger a global keyword search through all the resources in the cluster.
Saved Search
You can reuse one of your five most recent searches that display when you click in the Search data catalog box by selecting that entry from the list.
Advanced Search
You can select Advanced Search, which displays after the saved searches when you click in the Search data catalog box.
Searching with keywords
When you enter search keywords, Lumada Data Catalog performs two separate searches and combines the results:
- A search is performed for full or partial path names, such as /user/hudson/analysis/trend-2016.csv or analysis.
- A search is performed for all other metadata and sample data, such as the names of files, fields, tables, tags, and origins, and the content of tag descriptions and origin descriptions.
Entering keywords into the Search data catalog box returns matching resources or fields, which are determined by the following attributes:
Case sensitivity
Searches are case-insensitive except for path names. To find resources by their location in HDFS or S3, use the case represented in the file system.
Wildcard characters
When you type a word in the search box, Data Catalog’s search engine scans the query for the wildcard asterisk (*) character in the search for metadata containing your keyword. The search results produced depend on the wildcard character’s position around the keyword, as shown in the table below. If you enter multiple words in the search box, each word is included as an independent term, as shown in the table below:
Search Text Description Finds foo Strict equals search foo ^foo*
foo*
Starts with “foo” and searches include any number of succeeding characters. foo, food, foodbar *bar$
*bar
Ends with “bar” and searches include any number of preceding characters. Escobar, Zanzibar, foodbar, bar food bar OR
search on each wordfood, bar, foodbar, food_bar “food bar” Strict multi-word equals search food bar Quotes
If you enclose keywords or phrases in double quotation marks, Data Catalog searches for the exact phrase.
Special characters
If a resource or tag name contains special characters, such as $, @, &, and so forth, you must use a backward slash (\). For example, if your tag name is finance@USA$, then enter nce@USA\$ in the search box to find it.
If resource name or tag name contains multiple special characters in its name, place the backward slash (\) in front of any special character. For example, if your tag name is Park@Avenue#, and this tag is associated to any resource at resource/field levels, your search for Park\@Avenue# or Park@Avenue\# are both valid.
However, if a resource or tag name contains the hyphen (-) special character, you must use a forward slash (/) to escape the hyphen. For example, searching for Q1-2016 returns no results, but escaping the hyphen with a forward slash (Q1/-2016), returns Q1 2016 and Q1-2016.
If you are using a field search, you do not need to use slashes to escape special characters.
NoteWhen using a Resource search, combine special characters with text.
Path name searches
Path names searches can match on any part of a path name and are case-sensitive.
Lumada Data Catalog compares the search text to its list of all the path names for all the resources in the catalog. The search is performed as a string comparison where a file, table, or folder is returned when the keyword matches any part of the resources' path.
For example, part matches the file /data/transactions/part-r-00000/data. The keyword actions would match the same file. However, Part would not match the file because while path name searches can match any part of a path containing the search term, they are also case sensitive.
Other metadata searches
Lumada Data Catalog compares the search text to the names of files, fields, tables, tags, and origins, and the content of tag descriptions and origin descriptions.
When building the keyword search indexes for these items, Data Catalog ensures the metadata values match one or more keywords to any complete token in the index. White space and characters such as single and double quotation marks, question marks, parentheses, carets (^), pound signs (#), colons, periods, hyphens, and commas indicate the end of a word and are otherwise ignored. Words including underscores are not broken across the underscore.
The search is not case sensitive.
For example, risk matches the field name Risk Band. The keyword RISK has the same match behavior as risk, except in path names, which are case-sensitive.
However, risk would not match the field Risk_Band because the token is considered the entire phrase risk_band. To find matches that include the keyword somewhere in the tokenized name, you can use the asterisk wildcard character before, after, or both before and after the keyword. The keyword risk* would match the field Risk_Band.
Keyword searches (other than path names):
- Are case-insensitive
- Match complete words
- Accept the asterisk wildcard character to indicate any preceding or following characters
Characters such as plus sign (+), minus sign (-), ampersand (&), vertical bar (|), exclamation mark (!), carets (^), tilde (~), colon (:), and other special characters are treated as delimiters and are ignored in the search.
If you enter risk-band, the search behaves the same as if you entered risk band.
Refining search results
If you do a keyword search from the toolbar or from the Advanced Search page, it returns results from the entire cluster, including files and fields that directly match the search criteria. You can use the keyword search and facets in the left pane of the search results to further refine these results.
You may notice that global search results return matched files and all of the fields in those files. When you refine the results, only the fields that directly match the refinement remain in the results. Here's an example of how this works:
You enter restaurant in the toolbar search. The search results show:
- The files that have restaurant in their name or a file-level tag or tag description.
- All the fields in the matched files.
- The fields that match restaurant in their name, a field-level tag or tag description, or the sample data in that field.
If you refine the search results by entering cuisine in the keyword search in the left pane, the middle pane changes to show:
- Only the files that have both restaurant and cuisine in the file name or file-level tag name or descriptions.
- Only the fields that have both restaurant and cuisine in the field name, field-level tag or tag description, or sample data.
Unlike the original global search, no fields show simply because they were associated with a matched file. For example, if the global search results on restaurant matched a file inspections.csv with detailed address information, including a field with the tag US State, then all of the fields in the file inspections.csv appear in the global search results. When the results are refined by the keyword cuisine, the files and fields that now show directly match the keyword cuisine. The fields in the inspections.csv file that do not match cuisine directly are not shown.
Search details
Lumada Data Catalog search results are organized into self-contained panels to maximize
resource insight at a single glance. Each result panel contains key information organized
for easy viewing, such as path details, description, and type.
Sensitivity
Icon indicates sensitivity of the returned data.
File metadata
Contains the following file metadata parameters:
- File Size
- Fields
- Records
- Origin(s)
- Last Modified timestamp
State
Shows resources that are available for browsing, resources that are marked for deletion in Solr, and resources that are no longer available for processing.
Description
Plain text describing the resource, if available.
Resource Tags
Number of overflow tags that you can view by clicking the numeric link.
Resource type and path
List of file type and the path to the file location.
Search result details also indicate resource popularity metrics like average overall rating with total ratings and total posts.
Basic search
A basic search performs a global keyword search that lists the total number of
results for the search term or terms entered, and groups those results in tabs for
Resources facets and Fields facets.
Resources
List of resources that match the search term (including resource name, path, fields, tags or tag associations)
Fields
List of fields that match the search term (including resource path, fields, tags or tag associations)
Lumada Data Catalog provides built-in facets for Resources and for Fields that can further filter the search results.
View search results in the Resource tab

Procedure
Click the Open facet settings (gear icon) in the upper left corner.
The built-in facets appear as Available Facets in the Facets Settings dialog box. By default, the search results show all the built-in facets.To limit the search results to a chosen set of facets, select the check boxes next to the facets and use the right arrow button to move the selected facets from the list of Available Facets to the list of Visible Facets.
(Optional) Select a facet and use the up or down arrow buttons to change the order in which the Visible Facets appear on the search results page.
Click OK to show the facets in the search results.
NoteOnly the facets that have values display on the facets pane in the search results. Empty facets, even if selected in the Visible Facets, do not display.
View search results in the Fields tab

Procedure
Click the Open facet settings (gear icon) in the upper-left corner.
The built-in facets appear as Available Facets in the Facets Settings dialog box. By default, the search results show all the built-in facets.To limit the search results to a chosen set of facets, select the check boxes next to the facets and use the right arrow button to move selected facets from the list of Available Facets to the list of Visible Facets.
(Optional) Select a facet and use the up or down arrow buttons to change the order in which the Visible Facets appear on the search results page.
Click OK to show the facets in the search results.
NoteOnly the facets that have values display on the facets pane in search results. Empty facets, even if selected in the Visible Facets, do not display.
Advanced search
Like basic search, you can use keywords in an advanced search of Data Catalog. However, instead of just filtering out the search results as in basic search, you can apply filters before searching to limit the search itself. Search results are bound by your user access control permissions.
To perform an advanced search, click in the Search data catalog box and then click Go to Advanced Search.
Enter a keyword or keywords, then define the filters that you want to apply for your search. For example, in the Resources tab you can limit your search to the virtual folder BankRetail, or in the Fields tab you can limit your search to the string data type. After selecting the desired Entity type, click Apply filters and search.
Search using facets
Lumada Data Catalog crawls the data cluster to discover information from files, Hive tables, and fields inside of files and tables. It groups that information in facets to make it easy for you to search for files or tables with specific characteristics. The facets are categorized as follows:
File format
Search for a specific file format.
Resource type
Search through a specific resource type.
Data source
Search in a specific data source.
Virtual folder
Search in a specific virtual folder.
Processing status
Search within a specific resource status (like search only profiled resources).
Selecting more than one value inside the same facet includes files that match
either value (OR
). Selecting more than one value in multiple facets
includes files that match both values (AND
). If keywords are also
specified, the search results match both keywords and facet choices.
By default, Data Catalog provides the following facets:
Facet | Description | Notes | ||
File format | Data format of the file content. | Data Catalog profiles sequence files; however, the file content type is marked as the format in which each record is formatted (Avro, JSON, delimited text, or XML). | ||
Resource type | Data source type (HDFS/Hive). | If files have not been profiled, their source is identified as UNKNOWN. | ||
Resource size | Size of the resource. | Size facet ranges are inclusive of the start value and exclusive of the end value. For example, the range 1 MB - 1 GB includes 1 MB files up to 999 MB files. | ||
Resource origin | All files marked with the selected origin and any files with lineage relationships that lead to a file marked with the selected origin. | Results include files with confirmed (accepted) lineage relationships and relationships suggested by Data Catalog. | ||
Data source | The parent data source where the resource is located. | HDFS, Hive, MySQL, etc. | ||
Virtual folder | The virtual folder where the resource belongs. | The virtual folder is assigned to a user by the administrator and can map to any data source. | ||
Resource tag | The resource tags associated with the resource. | |||
Resource tag association state | Lists the resource tag association state of the resources matching the search term. | |||
Field tag | The field tags associated with the resource. | |||
Field tag association state | Lists the field tag association state of the resources matching the search term, along with the number Accepted, Rejected, and Suggested. | |||
Processing status | Outcome of profiling. | Folders appear as processed or unprocessed. Files and tables appear as profiled if most or all of the data profiled successfully. If profiling was attempted but not successful, files and tables are marked as 'profile failed'. Files or tables with data formats that Data Catalog does not support are marked as recognized or unrecognized based on the format. | ||
Sensitivity | Computed metadata attribute that identifies the sensitivity of the resource. | Sensitivity is based on the highest sensitivity level of any tag (field or resource) associated or suggested on the resource. | ||
Resource state | State of the resource, such as 'Available'. |
Facet | Description | Notes | ||
Data type | Data type for field value as formatted in the file. | Many file format types specify only String data types. This search does not use Data Catalog discovered type results. For example, for a JSON formatted file, you may see strings, integers, decimals, or Boolean values here but not dates, depending on the type information that is present in the JSON file. | ||
Field tag | The field tags associated with the resource. | |||
Field tag association state | Lists the field tag association state of the resources matching the search term, along with the number Accepted, Rejected and Suggested. | |||
Cardinality | The number of unique values in a column. | Affected by whether or not a file was fully profiled or sampled. | ||
Selectivity | Whether the resource is repetitive or unique. | |||
Density | The number of non-null values in a column. | Affected by whether or not a file was fully profiled or sampled. | ||
Data source | The parent data source where the resource is located. | HDFS, Hive, MySQL, etc. | ||
Virtual folder | The virtual folder where the resource belongs. | The virtual folder is assigned to a user by the administrator and can map to any data source. | ||
Sensitivity | Computed metadata attribute that identifies the sensitivity of the resource. | Sensitivity is based on the highest sensitivity level of any tag (field or resource) associated or suggested on the resource. |
In addition to using keywords and facets, you can also apply tag-based filters to include or exclude tags and tag children to perform conjunctive and disjunctive searches in an advanced search.
Including tag(s)
Enter tag names you want to include in your search. Only selected tag resources and fields are fetched; and, when Include child tags is checked, the children are also returned.
NoteBusiness entities are blocked from the Include child tags feature. If a business entity tag is selected, search results will not include its children.Excluding tag(s)
Enter tag names you want to exclude in your search. All tag resources and fields are fetched unless excluded; and, when Exclude child tags is checked, all children are returned unless excluded.
For example, when you search for the keyword "Personnel
Info" and include the tag US_State, the search results
are limited to resources matching the keyword and having the tag (suggested or accepted)
US_State. By including or excluding child tags, individual states
tagged with US_State
can also be filtered.
As with a global search, the available facets visible to the user can further filter out the advanced search results.
Search using Advanced Search
Perform the following steps to search using Advanced Search.
Procedure
Click in the Search data catalog field, and then click Go to Advanced Search.
The Advanced Search Form page opens.
Enter your search term or terms in the Keywords field.
Select the Entity type that you want to search:
- Resources: to search resources.
- Fields: to search fields.
(Optional) Enter a tag or tags in the Including Tag(s) field from the dropdown menu, and select the Include child tags check box if you want to include child tags in your search.
Selected included tags appear on the Advanced Search Form page.(Optional) Enter a tag or tags in the Excluding Tag(s) field from the dropdown menu, and select the Exclude child tags check box if you want to exclude child tags from your search.
NoteIf the Including Tag(s) and Excluding Tag(s) fields contradict each other, then Excluding Tag(s) takes precedence.Selected excluded tags appear on the Advanced Search Form page.(Optional) Depending on your selected Entity type, apply facets:
Resources You can search any or all of these resource facets: - Data source
- Virtual folder
- Resource Type
- File format
- Processing status
Fields You can search any or all of these field facets: - Data source
- Virtual folder
- Data Type
- Field tag association state
Click Apply filters and search.
Results
Using a custom search query
You can write expressions to perform queries on searchable property resources or fields using Lumada Data Catalog (LDC) search language. The syntax you use must specify the property or properties to be compared and the operator type wanted for the query. After you have entered the filter string, and following internal validation of the code, the search is executed. Displayed search results depend upon your access control permissions.
To perform a custom search, click in the Search data catalog field then click Go to Advanced Search and select the Custom Search Query tab.
The following table lists the custom query operators in the LDC search language.
Operator | Description |
eq | Equal to |
ne | Not equal to |
co | Contains |
sw | Starts with |
ew | Ends with |
gt | Greater than (supports date-in-string formats) |
ge | Greater than or equal to (supports date-in-string formats) |
lt | Less than (supports date-in-string formats) |
le | Less than or equal to (supports date-in-string formats) |
OR | Logical OR conjunction between two filters, matches if
either contains the criteria |
AND | Logical AND conjunction between two filters, matches if both
contain the criteria |
not | Negation of the eq , ne ,
co , gt , ge ,
lt , le , OR , and
AND operators |
is null | Empty |
is not null | Not empty |
You should observe these rules when using Custom Search Query:
- Only searchable properties can be queried. Searches can have a mix of
properties. Searches on strings are case sensitive and matches happens on the exact value
of the field. Searches on fields can be
text_general
, andtext_with_special_chars
, which are case in-sensitive, and matches happen on the terms generated by Solr for the given field value. - Ranges can be given for numbers and time variables with a combination of greater than and less than operators.
- Multiple statements can be given with a combination of
AND
andOR
operators. - Statements can be segregated using parenthesis
()
. - Provide file sizes in bytes.
- The following date-in-string formats are supported:
dd/MM/yyyy hh:mm:ss
dd-MM-yyyy hh:mm:ss
dd-MM-YYYY
dd/MM/YYYY
Syntax examples of custom queries and their meanings are provided below.
Syntax example | Meaning |
name eq "data.json" | Find a name that equals data.json |
name co "json" | Find a name that contains json. |
name sw "s" | Find a name that starts with s. |
time_of_creation gt "12-03-2020" | Find a time_of_creation that is greater than the date 12-03-2020. |
time_of_creation ge "12-03-2020" | Find a time_of_creation that is greater than or equal to the date 12-03-2020. |
file_size lt 1700000 | Find a file_size that is less than
1.7 MB. |
file_size le 1700000 | Find a file_size that is less than or equal to 1.7 MB.
|
file_size gt 1700000 and file_size lt 2000000 | Fetch records with a file_size range of 1.7 MB to 2.0 MB. |
file_type eq "CSV" and (name co "example" or name co
"myfile") | Fetch data with file type equal to csv and with the name containing example or myfile. |
Search using a custom query
Perform the following steps to search for tags using a custom query:
Procedure
Click in the Search data catalog field, and then click Go to Advanced Search.
The Advanced Search Form page opens.Select the Custom Search Query tab.
The Custom Query page opens.
Choose the Entity type that you want to search:
- Resources: to search resources.
- Fields: to search fields.
Enter your query syntax in Custom Query text box.
If necessary, click Reset to clear any incorrect entry and reset the query text box.Click Run Search.
The validity of the filter string is checked and then the search is performed.
Results
Filtering search results by resource and by field
By default, resources that match the search criteria themselves or contain fields that match the search criteria appear in search results. By clicking the Fields tab, you can change this view to the list of fields that match the search criteria.
Long facets
If there are more than five facets in any category, you can click
View More to display all the facets in a separate dialog box. These
are referred to as long facets.
If you select multiple facets in the same category, the resulting filtered search list is an OR
filter of the selected facets. If you select multiple facets from different categories, the result is AND
filtering.
For example, if you select facets US_State or US_City from the Field Tags category, the resulting list is the OR
filtering displaying resources with field tags US_State or US_City. If you also select the facet Accepted from the Field Tag Association State category, the resulting list displays only resources that have field tags US_State
or US_City
in the Accepted
Field Tag Association State.
Sorting search results
Search results can also be sorted by Relevance, Name, and Rating. For single field searches only, you can sort by Confidence.
Export your findings to a CSV file
Perform the following steps to export your findings to a CSV file.
Procedure
Click Export as CSV or Export Table as CSV to start generating the export data.
The Export CSV Settings dialog box displays.Select which properties to export for each resource. Click Select All to include all the properties listed.
Click Export to generate the data.
After the CSV data values are successfully generated, a confirmation message appears in the header with an exports link.If you are ready to download the generated information at this point, click exports in the header message.
The Exports page opens. This page provides a summary of your exported reports, including the report name, the report type (from where is was generated), the generation interval, and the report size. Any report listed here is automatically deleted within seven days from the time the report is generated.NoteIf you want to wait until later to download the generated CSV data, you can access the Exports page through the Exports option in your User Profile menu.From the reports table, click More actions, and then select Download report.
The generated CSV file is downloaded to the location specified for the Path to exports configuration property during your installation of Data Catalog. See Managing configurations if you need to reconfigure the Path to exports property to a different location.
(Optional) To delete a report, click More actions, and then select Delete report.
Results
Customized resource facets tutorial
Custom resource properties along with search dimensions form custom facets. The following tutorial is intended for users who want to use custom facets to search Lumada Data Catalog data. The search dimensions set for a user show up as custom facets in the search results pane.
For example, your administrator has granted you the custom analyst role NorCal_Analyst
for processing claims in the northern California region. Your administrator has set up the following conditions:
- A Claims custom property group.
- Custom properties called Claim_Status, Claims_Region, and Claim_Code in the Claims custom property group.
- The custom properties are limited to pre-filter the values Open and Pending for Claim Status and NorCal for Claims Region.
The following image shows where the pre-filters and custom facets appear on the page.
Procedure
If you search for Customer, the search results list the resources that match the keyword and are limited to the pre-filter values for the search dimensions in our example. They are also filtered by resources that have the custom property value NorCal for Claims Region and Open for Claims Status.
If you choose not to apply the pre-filters on their search results, perform the following steps:
Click the Open facet settings (gear icon) in the upper-left corner of the Resource tab to open the Facet Settings dialog box.
Select No for Apply pre-filter values? in the Facet Settings dialog box.
Click OK.