Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Role-based access control (RBAC)

Parent article

Lumada Data Catalog uses Keycloak as the Identity Management Provider (IDP) for user authentication.

When a user logs in, Keycloak passes to Data Catalog the list of roles to which the user is entitled. User authorization of Data Catalog features and resources is provided by role-based access control (RBAC) to secure data while providing workflow efficiency. With RBAC, you can manage who has access to resources, what actions they can perform, and which areas they can access, providing fine-grained control over access management. In addition to defining roles and permissions on a granular level, you can modify the default (predefined) roles, create your own custom roles, and assign permissions as needed. For instance, you can create a custom role that can search metadata and create HIVE tables from HDFS files, and then set this role as a default for all new users.

You can access the RBAC feature by navigating to Management and then clicking User Roles.

User groups allow you to manage a large number of users by organizing them efficiently so you can assign them easily to features and roles. Data Catalog organizes user roles and permissions into three groups: Administrator, Steward, and Business. You can generally organize your existing users into these groups based on their job type, such as system administrator, data steward, or business analyst, and then further define their permissions on a granular level. However, most roles generally include a combination of permissions from different groups because of dependencies across the groups. Using RBAC, you can assign all the Data Catalog features to users in your organization by dividing permissions between the different roles within the system while restricting permissions to perform critical or sensitive activities to only selected users. RBAC also provides control over search dimensions and the visibility of facets in the search results. See Search dimensions and custom facets for more information.

Data Catalog offers the following user groups:

  • Administrator

    Permissions include Data Catalog settings and management functions often performed by the Data Catalog administrator.

  • Steward

    Permissions include those tasks performed to create and build the catalog, including permissions that control updates for resource metadata properties.

  • Business

    Permissions include the daily activities of a data analyst. This group also includes generic guest privileges.

    • Analyst: The Analyst subset includes managing glossaries and running jobs, and tasks for Data Catalog maintenance.
    • Guest: The Guest subset includes basic permissions for browsing the catalog.

Data Catalog installs with one predefined role: global_administrator. After installation, you should create the additional roles that your organization needs so that all the activities in Data Catalog can be performed.

CautionAll the RBAC permissions must be assigned to ensure that Data Catalog operates properly.

Administrator role permissions

The Administrator group includes permissions for managing Data Catalog, including roles and users as well as system settings. Additionally, when you assign a user to the Administrator group, you can also select to include the permissions from the Steward and Business groups for that user. The following table lists the permissions contained within the Administrator group.

Administratorgroup

PermissionActionsNotes
Manage Business GlossaryCreate, read, update, and delete glossaries.
Manage AgentsCreate, read, update, and delete agents.
Manage Configuration SettingsCreate, read, update, and delete system configurations.
Manage Custom PropertiesCreate, read, update, and delete custom properties.Requires the Manage External Sources permission to update custom properties.
Manage Job TemplatesCreate, read, update, and delete job templates.Requires the Manage Virtual Folders and the Run Jobs permissions to select assets.
View Job ActivityView system-wide job activity.Requires the View Job Logs permission to view and download logs.
Manage User RolesCreate, read, update, and delete user roles.
Manage External SourcesCreate, read, update, and delete external sources through the command line and API.Provides the ability to integrate Apache Atlas functions.
Associate Roles with VFsCreate role associations to virtual folders.

Requires the Manage User Roles permission.

Only the virtual folders visible to the current Administrator role can be assigned.
Associate Roles with Business GlossariesCreate role associations to glossaries.

Requires the Manage User Roles permission.

Only the glossaries visible to the current Administrator role can be assigned.
Configure WorkflowsCreate, read, update, and delete workflow.

Steward role permissions

The Steward group includes permissions for creating, building, and curating the catalog, including permissions that control updates for resource metadata properties. Additionally, when you assign a user to the Steward group, you can also choose to include the permissions from the Business group for that user. The following table lists the permissions contained within the Steward group.

Steward group

PermissionActionsNotes
Run Business RulesExecute rules such as for labeling, data quality, and associating glossary terms,
Lineage CurationCurate data lineages. Includes suggested inferred lineages on nodes and edges and importing factual lineages.
Manage Data Resource FieldsCreate, read, update, and delete fields on custom field comments and custom field labels.
Resource: MetadataCreate resource metadata. Controls the update feature on resource properties.
Resource: ContentCreate resource content. Controls the update feature on data resource properties like resource_field_tags and text.
Manage Business RulesCreate, read, update, and delete rules such as for labeling, data quality, and associating glossary terms.
Manage Virtual FoldersCreate, read, update, and delete virtual folders.
Manage Data SourcesCreate, read, update, and delete data sources.If you delete a data source, the corresponding (root/child) virtual folders are also deleted.

Business role permissions

The Business role includes permissions for curating terms, running jobs, and browsing the catalog. This role is divided into two groups:

  • Analyst
  • Guest

Business group

The Analyst subset of the group includes the daily tasks for a data analyst, including permissions for term curation and catalog maintenance.

PermissionActionsNotes
View Business RulesView rules such as for labeling, data quality, and associating glossary terms.
Manage Business TermsCreate, read, update, and delete glossary terms.Requires the View Business Terms permission.
Associate Business TermsCurate terms.

You can only curate tags from assigned glossaries. Allows you to accept and reject business term associations.

Requires the View Business Terms permission to see the terms from assigned glossaries.

Run JobsJob execution (sequence and template) for available resources.
View Rationalization DashboardView the Rationalization dashboard.
Run Term DiscoveryPerform term discovery.
Review WorkflowsReview business term
Approve WorkflowsApprove business term

The Guest subset of the group includes the minimum permissions for accessing and browsing the catalog.

PermissionActionsNotes
View Business TermsBrowse glossary terms. Requires View Business Glossaries permission to see terms in the user interface. Only terms from assigned glossaries can be viewed.
View Business GlossariesBrowse business glossaries.

Predefined roles

Data Catalog installs with a predefined global_administrator role. This admin role has benefits and limitations and is included as a starting point. You are encouraged to create the roles your organization needs following installation, including administrators, data stewards, business analysts, and basic guests.

The predefined global_administrator role ships with preselected Administrator permissions. You can use this role to create additional roles with the required permissions for building your instance of Data Catalog. Additionally, you can use this role to perform the following post-installation tasks for setting up Data Catalog:

  • Create and register data source agents that will serve the data sources in different local and remote clusters, such as onPremAgent, azureAgent, AWS-CloudAgent, and EMEA-Agent. See Manage agents.
  • Create the secondary administrative roles, such as FinanceAdmin, MarketingAdmin, and SalesAdmin, with the applicable Administrator and Steward permissions. To be effective, these secondary admin roles must be given proper permissions while respecting all permission dependencies. See Managing roles.
  • Add users to Data Catalog (or delegate this activity to the secondary administrators). Refer to Add a user.
  • Assign roles to users (or delegate this activity to the secondary administrators). See Assign a user to a role.
  • Create custom properties (or delegate this activity to the secondary administrators). See Add a custom property.

Resource read access control

As a Data Catalog administrator, you can enable Sample Data Access for a role:

  • If Yes is selected, then the user can see sample data from the Data Canvas and Search.
  • If No is selected, then the user cannot see sample data from the Data Canvas or Search.

RBAC and security

Data Catalog leverages Keycloak and RBAC to integrate with user authentication methods. The following process illustrates the flow of Data Catalog user authentication.
  1. Login. User A logs into the browser, and the browser sends a request to Keycloak over HTTPS.
  2. Authentication. Keycloak sends a response with the username and password to the authentication server. After a unique response is retrieved, User A can log in to Data Catalog.
  3. Authorization. Data Catalog honors the defined access policies.
    • The Data Catalog service user is used to impersonate the logged-in user while browsing HDFS resources. Also, the Data Catalog service user is used as a proxy user to browse Hive resources.
    • Data Catalog roles and RBAC models apply, and users are only allowed to access virtual folders and glossaries according to the role assigned to them.
    • Each user is required to enter a user name and password to access the Data Catalog portal.