Use the Lumada Data Catalog data lineage tools to help track the relationships between data resources in your data environment, which is especially helpful when you frequently merge and duplicate data. Knowing where the data has come from can help you track down data quality problems, know whether the data can be trusted, or confirm that data from a particular system or region is included. Knowing where the data is going can help you determine who depends on it, and see how the data flows through systems and business processes.
Data Catalog uses resource data along with its metadata to discover cluster resources that are related to each other. It identifies data copies and merges resources with the horizontal and vertical subsets of these resources.
When you are viewing a resource in the Data Canvas, you can view the lineage information for a resource by clicking View Lineage in the Data Lineage area of the screen. This opens the Data Lineage page.
The Lineage graph visually traces relationships between resources with overlapping data.
You can use the following tools in the Data Lineage menu bar to help you analyze the graph and decide whether to accept or reject the traced lineages.
|Data Lineage page / Actions
|Find in graph
|Type the name of the resource you want to find on the current lineage graph. Suggested best matches are highlighted in the graph.
|Select the type of lineages to show within the current scope of the graph, from Suggested, Accepted, and Rejected.
|Resource view/Field view (toggle)
|To show the resource lineage, click the resource icon, or to show the field lineage, click the field icon.
|Upstream and Downstream
|Click the down arrow to set the hop level in the current graph. Data Catalog can provide up to three (3) lineage hops upstream or downstream. The anchor resource is set at level 0 and is the default lineage graph for any resource for which lineage has not yet been discovered. Use the hop level to validate the authenticity of the data flowing into the anchor resource to determine whether to accept or reject the lineage.
|The graph controls include icons for zoom out, fit and center, and zoom in. Click to resize or reposition the lineage graph.
A data lineage graph can contain the following elements:
|Parent or source resource
|Displayed per hop level. A source with a dotted boundary indicates the presence of an upstream source, while a source with solid boundary indicates the parent resource.
|Anchor or target resource
|The resource of interest for which lineage is being examined. It is indicated by a color-filled Resource node with a target icon.
|Denoted by a dotted line between the source and target Resource nodes via an Operation node.
|Denoted by a solid line between the source and target Resource nodes via an Operation node.
NoteOnly visible if the rejected lineage is set to be visible with the Rejected checkbox selected in the View element of the Data Lineage menu bar.Denoted by a red line and link icon with a diagonal line through it in the Operation node between the source and target Resource nodes.
You can gain deeper insight into the lineage discovered with the details displayed on the Details pane, which change depending on whether you select the Operations, Resource, or Field node.
Operations node details
You can click the Operation node (the node with a link icon) to display the lineage information on the Details pane. The lineage actions that display vary depending on whether the lineage is accepted, rejected, or suggested, and which lineage actions are selected for viewing by the View setting in the Data Lineage menu bar.
On the Data Lineage page, you can use the Details pane to accept or reject a suggested lineage or create your user-defined lineage as long as you have permission.
The Steward and Administrator roles have the permission to curate lineage.
Accept a Suggested or Rejected lineage.
Establish your user-defined factual lineage to the operation when you specify the absolute path to a parent resource. When you add factual lineage, all suggested edges for the Operation node automatically become accepted. Factual lineage on the Operation node is then validated. Data Catalog performs path validations and actual metadata/data relation checks on your user-defined lineages on the Operation nodes, as displayed in Field Mapping details for the added resource.
Allow discovery of the resource when a lineage discovery job is run by selecting the resource and clicking Allow Discovery. Allow Discovery is a toggle with Forbid Discovery, and both are set at the resource level.
Delete an Accepted or Rejected lineage. Deleted lineages will be rediscovered on the next non-incremental Lineage Discovery job to automatically transition into Suggested lineage state.NoteTo discover previously-deleted lineage, you need to click Allow Discovery for the resource.
Forbid discovery of a resource when a lineage discovery job is run by selecting the resource and clicking Forbid Discovery. This is useful if you want to ignore backup files, for example. Forbid Discovery is a toggle with Allow Discovery, and both are set at the resource level.
Reject a Suggested or Accepted lineage. Rejected lineages will not be rediscovered on the next non-incremental Lineage Discovery job to automatically transition into a Suggested lineage state.
View the field level overlap relationships between the immediate source and target resources involved in that Operation node.
Go to a lineage view with this resource as the target resource.
Rejecting or deleting edges on the Operations node
Lumada Data Catalog does not support lineage actions for independent edges. Any action on an edge will be performed on the Operation node. Exercise caution when adding factual sources on an Operation node. These sources cannot be independently rejected or deleted without affecting the other resources associated with the operation.
To remove a factual source from an operation node:
- Reject the operation.
- Delete the operation.
- Re-run lineage discovery to recover any suggested lineages associated with the Operation node that you deleted in step# 2.
Additional lineage information
You can find additional information about the lineage on the Details pane. The panels visible depend upon the element selected in the lineage graph.
Depending on the element selected, you can:
- Accept Lineage
- Add Source
- Allow Discovery
- Delete Lineage
- Forbid Discovery
- Reject Lineage
- View Mapping
- Visit Resource
Allow discovery of the resource when a lineage discovery job is run.
Populated by Data Catalog with a default expression used to identify the relationship for the selected Operation node.
Notes related to the lineage.
Description of the lineage node.
Placeholder for a future Data Catalog use.
Forbid discovery of certain resources when a lineage discovery job is run.
Glossaries, if any, related to the lineage element.
Displays timestamps for the chronological history of the time of creation and time of last modification for the lineage operation.
Name of the resource.
Terms, if any, related to the resource. You can click Add Term to assign an existing term to the resource.
Resource node details
On the Data Lineage page, click the Resource node to display the resource-related lineage actions and information on the Details pane.
Resource Type & Path
The Code section identifies the type of resource (file/collection/table and so on) with the absolute path.
If your user role allows, you can use the following actions depending on whether the resource is a target (anchor) resource, or a source resource:
For a target (anchor) resource, you can add and define factual lineage by specifying the absolute path to a parent resource. For a source resource, you can visit the resource.
Lists any resource business terms associated with that resource. Users with permission can click Add Term to add a term.
Lists the glossaries of any field business terms associated with or suggested on the fields of that resource.
The plain text description of the resource that is obtained from the Summary tab.
Displays the time created and time last modified timestamps for the resource.
Field node details
On the Field node, you can select the target (anchor) resource from the drop-down menu to display the lineage for the field between the source and target resources. The details for the Field node provide the field-related information. If a node is selected, its status is shown (Suggested, Accepted, or Rejected).
You can take the following recommended actions depending on whether the resource is a target (anchor) resource, or a source resource:
- For a target (anchor) resource, you can Add Source, Forbid Discovery, or Allow Discovery.
- For a source resource, you can Visit Resource.
Refers to the flattened text description associated with a field as set via Rest API or Hive/JDBC comments.
Lists the glossaries of Field Terms associated with the resource field.
Displays the chronological history of the time of creation and the time of the last modified timestamps for the resource.
Lumada Data Catalog provides a framework to allow integration to other applications from email servers to data cleansing and visualization tools. The integration framework lets you define an action menu option at the resource level that will initiate a client-side or server-side operation.
With the help of this framework, you can import the lineages from third-party tools.
Integrating the Atlas third-party tool
In Data Catalog, you can integrate the Apache Atlas third-party tool to import lineages. To import lineages, click Tools on the left navigation menu and click Lineage – Import/Export. Upload a file with lineage information that you want to import. From the drop-down menu, select Import Operations from Atlas.
After you import the operations, you can view the imported lineages by selecting View Lineage from a resource selected in the Data Canvas. The Description displays Atlas Lineage Import. All the imported lineages are in the Accepted status.