Apache Ranger and Hive column-level security
Lumada Data Catalog honors access policy models set up with the Apache Ranger security framework. Ranger provides fine-tuned access control to resources in Data Catalog, specifically Hadoop and related components, such as Apache Hive, and HBase, and provides a way of delegating Hive data to various levels of Data Catalog administrators, stewards, and users.
Security policy specification and processing order
An Apache Ranger security policy consists of two major components:
- Specification of the resources to which a policy is applied, namely Hive databases, tables, and columns.
- Specification of access conditions to specific users and groups (Include/Exclude).
Used in conjunction with Ranger, Data Catalog controls database and table data visibility in Lumada Data Catalog in this order of precedence:
- The policies set forth in Ranger (for user group/user policies and Include/Exclude access type).
- That user's Metadata Access level and Data Access level set in Data Catalog's RBAC. See Role-based access control (RBAC) for details.
While Data Catalog respects Ranger policies for all roles, make sure to give the Data Catalog service user sufficient privileges to profile and gather data fingerprints for analysis in the Lumada Data Catalog.
Apache Ranger use cases
To illustrate how Data Catalog's RBAC policies interact with Ranger security policies, the following scenarios use a hypothetical DB1 database containing table T1 with columns c1, c2, c3, c4.
Use case one
In this use case, the user only sees column c1 and the data c1 contains.
Criteria:
Ranger Policy Include/Exclude: Column
Include:c1
Data Catalog RBAC Metadata Access: Data Access
Native: Native
User sees
c1 only (with data)

Use case two
In this case, the user's Lumada Data Catalog Metadata:Native RBAC access to the resource T1 overrides the Ranger policies. While the user can see the data only for column c1, per the Ranger policy, Lumada Data Catalog lists the other columns showing only the metadata because Data Catalog RBAC gives the user Metadata Access.
Criteria:
Ranger Policy Include/Exclude: Column
Include:c1
Data Catalog RBAC Metadata Access: Data Access
Metadata: Native
User sees
All columns, but data only for c1
Use case three
In this use case, the user sees all the columns except column c1, nor any of the data c1 contains.
Criteria:
Ranger Policy Include/Exclude: Column
Exclude:c1
Data Catalog RBAC Metadata Access: Data Access
Native: Native
User sees
All columns, but c1
Use case four
In this case, the user's Lumada Data Catalog Metadata:Native RBAC access to the resource T1 overrides the Ranger policies. While the user can only see the metadata for column c1, per the Ranger policy, Lumada Data Catalog lists the other columns showing only the metadata because Data Catalog RBAC gives the user Metadata Access.
Criteria:
Ranger Policy Include/Exclude: Column
Exclude:c1
Data Catalog RBAC Metadata Access: Data Access
Metadata: Native
User sees
All columns, but metadata only for c1
Use Ranger Hive column filtering
Follow these steps to use Apache Ranger Hive Column Filtering in Data Catalog.
Procedure
With any text editor, open the app-sever/conf/configuration.json file.
Locate the
ldc.metadata.hive.columnLevelFiltering
property and set it to true."ldc.metadata.hive.columnLevelFiltering" : { "value" : true, "type" : "BOOLEAN", "restartRequired" : true, "readOnly" : false, "description" : "Controls the columns level filtering of hive for user", "label" : "Hive Column Level Filtering", "category" : "MISC", "defaultValue" : false, "visible" : true, "uiConfig" : false },
Save and close the file.