Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at


Hitachi Vantara Lumada and Pentaho Documentation

Data Catalog assets

Parent article

Lumada Data Catalog provides data management and data representation with its own logical data entities, including the following assets:

Virtual folders

Use virtual folders to create smaller groups of resources belonging to a data source for easier management. Data resources can be members of multiple folders, so you can create folders with overlapping sets of data resources.


When you have a growing set of data that extends across multiple files in Lumada Data Catalog, you can view the data as a single resource known as a collection.

When files are added to one of the directories identified as part of the collection, Lumada Data Catalog needs to run schema and profile discovery to reflect the newly added data in the collection.

When a folder becomes a collection, the files inside the folder no longer appear individually in search results. Instead, search results show a single representation of all the files. The collection can be made up of files in a single folder or files in many folders all under a single top-level folder.

Custom properties

Custom properties collect additional metadata about resources specific to a business user's environment or engagement. For example, you could define a custom property to include a business user's name for a resource. Or, you could define a property that includes values that are used by system-level processes.

Additionally, you can group these properties together in custom property groups based on their business value or category. Custom properties can be moved individually or in bulk across custom property groups. You can then use custom property groups as custom facets in the search results with search dimensions.

Search dimensions and custom facets

As the admin, use search dimensions to control the visibility of facets in the search results for an end user. When a search dimension is defined for a specified role, the users with that role can then see the search results categorized by the search dimensions defined for that role.

For example, the admin can limit the search results for the Analyst role to the categories Rating, Resource Term, Virtual Folder, and the custom facet Claims, which is specific to business users. For Analyst users, search results are faceted depending on the search dimensions set by the admin: Rating, Resource Term, Virtual Folder, and Claims.

Job templates and sequences

Templates are pre-defined job templates created by the administrators to run specific job sequences that apply to specific clusters. Job templates have system or Spark-specific parameters as command line arguments for the job sequences, such as driver memory, executor memory, or number of threads required based on a cluster size. You can override the default Data Catalog parameters. For example, you can set the incremental profile to false, profile a Collection as a single resource, or force a full profile instead of the default sampling option.

Contact your system administrator to determine the template that is best suited for your data cluster.

Sequences are Lumada Data Catalog's job sequences that users with the proper permissions can execute. These jobs are executed with default parameters, and you cannot use the Sequence option to override the default parameters.

Rules engine

With Lumada Data Catalog's rules engine you can define, execute, and manage term-based rules. These rules can evaluate data and metadata properties to add terms, remove terms, modify custom properties on data assets, and generate reports.

Users define SQL-like rules for selective actions based on specific data or metadata conditions. Both data and metadata rules provide users the flexibility to create rules that operate on the data or operate on the metadata of the resources in Data Catalog, and then associate terms, update properties, and define conditions.