Managing datasets
This article discusses how to manage datasets if you are a non-administrative user with permissions to manage properties for datasets and data objects in Lumada Data Catalog.
Managing datasets
After administrators create datasets, they can delegate non-admin roles to manage the properties and features of the dataset. Non-admin user roles with permissions can manage the following dataset properties:
- Name
- Description
- Path specification
- Reported Schema
They can also perform the following tasks:
- Add resource
- Add tag
- View as list
- Run profiling and discovery jobs
Updating dataset properties
You can update the name, description, path specifications, and include/exclude parameters of an existing dataset in Lumada Data Catalog.
Name and description updates do not alter the profiling information, but changes to the path specification and include/exclude patterns can alter the dataset metadata. This update affects only the resources added after the update. Path validation also occurs only for resources added after the update. There is no effect on the existing dataset resources.
Settings tab
In Lumada Data Catalog, you can update dataset names by selecting the Settings tab and entering the new name as described in the following table.
Field | Description |
Dataset Name | Data Catalog identifies a dataset by the unique name entered in this description. The name must begin with a letter and only contain alphanumeric characters, hyphens, and underscores. |
Path specifications tab
In Lumada Data Catalog, you can select the Path Specifications tab to update path specifications as described in the following table.
Field | Description |
Path Specification | Enter the source path with the include/exclude patterns to create the qualifying template for the dataset. |
Source Path | Enter the absolute path of the virtual folder that will be a part
of this dataset. This path becomes the template against which all new resources
added to the dataset are compared. New dataset resources must belong to this source
path or to a subset of the source path. NoteThe new source path must conform to
the original virtual folder path specification. |
Include Pattern | Include a list of resources from the virtual folder to specify the regex pattern you want to include in this dataset. |
Exclude Pattern | Specify the regex pattern for a list of resources from the Source Path specification that you want to exclude from this dataset. |
Multiple Path Specifications | Include one or more path specifications for the dataset as long as the paths belong to the same virtual folder specified in the Virtual Folder field. When paths belong to the same virtual folder, resources across the virtual folder source can be added to the dataset. Different combinations of include and exclude patterns make it possible to include or exclude specific types of resources. |
Reported schema tab
In Data Catalog you can update reported schema on the Reported Schema tab. Reported schema is a user-defined schema that is representative of the expected schema for the resources in the dataset.
The reported schema is currently used only for display convenience. To that end, keep in mind that the Data Catalog does not perform schema validation on the member resources against the reported schema. The reported schema is just listed along with the discovered schema of the member resources in a single-file view for the dataset.
You can update an existing field or define a new field by using the following features:
Edit/Create Schema Field dialog box
Specify the field name and expected data type.
Custom Data Type field
In addition to standard built-in data types (integer, string, float, Boolean, byte, short, long, and double), you can add custom data type options to support user-defined data types required for specific applications.
For each reported schema in the Reported Schema tab,
you can click the Action icon to select from the following menu
options:
- Select Edit to redefine the name, label, data type and description of a schema field.
- Select Insert field above or Insert field below to insert schema fields in an existing schema.
- Select Delete to delete a schema field.
Add member resources to a dataset
Perform the following steps to add a resource to a dataset:
Procedure
Navigate to
.Click the Action menu (icon) and select Add resource from the drop-down menu.
Enter the absolute path of the resource and click Add.
Results
At this point, Data Catalog does not check that a resource is added. If a resource does not exist in your data lake, it can still be added to the dataset provided it satisfies the path specifications.
View Dataset member resources
- The Manage Datasets menu
- The Browse Datasets list
- Dataset single resource view (SRV)
Perform the following steps to view member resources in a dataset.
Procedure
Click the Action menu (icon) and select View as list to navigate to a page displaying the list of the member resources for that dataset.
(Optional) Click the member resource for the single resource view.
(Optional) Click Filter to filter this view by member.
Delete dataset member resources
Perform the following steps to delete a member resource from a dataset:
Procedure
Navigate to the single resource view of the dataset.
Click the Action menu (icon) and select Remove from dataset in the drop-down menu.
The dataset is removed.
Next steps