Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Advanced configuration

Parent article

After installing Data Catalog, there may be other components you need to set up, depending on your environment. Use the following topics as needed, to finish setting up your environment.

  • External Keycloak instance

    In Data Catalog, an external Keycloak instance provides a secure and centralized authentication mechanism for user access. It also supports a wide range of authentication protocols and standards. Using an external Keycloak instance also provides additional benefits such as centralized management of user identities and access control policies, which simplifies user administration and reduces the risk of security vulnerabilities. For more information, see Configuring external Keycloak.

  • SSL certificate

    In Data Catalog, you can use an SSL (Secure Sockets Layer) certificate to ensure secure network communications. In SSL communications, communication routes are encrypted to prevent information leakage and detect data manipulation during transfer. For more information, see Configuring Data Catalog to use an SSL certificate.

  • External MongoDB

    MongoDB is a document database that manages document-oriented information and stores and retrieves data. Data Catalog uses MongoDB as a repository to store the metadata collected from processing functions. For more information, see Configuring an external MongoDB.

Configuring external Keycloak

Keycloak is an open-source identity and access management (IAM) solution. Lumada Data Catalog uses Keycloak to provide authentication and authorization services that authenticate users, manage user roles and permissions, and control access to the Data Catalog resources.

This article depicts setting up and configuring an external Keycloak instance for Data Catalog.

Before you begin, you should have the following items ready to set up the external Keycloak instance:

  • An instance of Keycloak, set up by an experienced user

    If you don’t have a Keycloak instance, go to Set up a Keycloak instance.

    NoteNew Keycloak instance setup requires that Docker software is installed.
  • A host capable of running Data Catalog
  • (Optional) SSL certificates for securing Keycloak

Set up a Keycloak instance

This topic depicts how to set up a Keycloak instance.

Make sure you have the ldc-realm.json file ready. You can extract the ldc-realm.json file from the Data Catalog Helm chart.

tar -axf ./ldc-7.3.0.tgz ldc/charts/keycloak/files/ldc-realm.json -O > ldc-realm.json 
NoteAlternatively, after Keycloak deployment, you can also sign into Keycloak as an administrator and import the ldc-realm.json file.
Keycloak deployment methods

You can deploy Keycloak with two methods:

NoteKeycloak over basic HTTP is only suitable for development environments, not for production environments. For production environments, see Set up secure Keycloak over HTTPS.

Set up basic Keycloak over HTTP

For basic Keycloak over HTTP, you need to set up the Keycloak via Docker (recommended by Keycloak).

To set up basic Keycloak over HTTP, run the following command.

docker run -d --name keycloak -p 8080:8080 -p 8443:8443 \ 
	-e KEYCLOAK_ADMIN=admin -e KEYCLOAK_ADMIN_PASSWORD=admin \ 
	-v $PWD/ldc-realm.json:/opt/keycloak/data/import/ldc-realm.json \ 
	-v keycloakData:/opt/keycloak/data \ 
	quay.io/keycloak/keycloak:20.0.1 start-dev --import-realm 

Set up secure Keycloak over HTTPS

You may use certificates from a trusted certificate authority or generate your own self-signed certificates (see Configuring Data Catalog to use an SSL certificate for more information). Either way, you should have a certificate and key files.

NoteSetting up an external Keycloak instance over HTTPS is a best practice.

For deploying a secure Keycloak instance over HTTPS, you need to set up Keycloak using Docker.

Procedure

  1. Get the hostname through which you access your Keycloak instance, which must match your certificate.

  2. Run the following command:

    KC_HOSTNAME="<keycloak-hostname>" # <-- replace with your Keycloak hostname !!! 
    docker run -d --name keycloak -p 8080:8080 -p 8443:8443 \ 
    	-e KEYCLOAK_ADMIN=admin -e KEYCLOAK_ADMIN_PASSWORD=admin \ 
    	-e KC_HTTPS_CERTIFICATE_FILE=/opt/keycloak/conf/server.crt.pem \ 
    	-e KC_HTTPS_CERTIFICATE_KEY_FILE=/opt/keycloak/conf/server.key.pem \ 
    	-v $PWD/keycloak_cert.pem:/opt/keycloak/conf/server.crt.pem \ 
    	-v $PWD/keycloak_key.pem:/opt/keycloak/conf/server.key.pem \ 
    	-v $PWD/ldc-realm.json:/opt/keycloak/data/import/ldc-realm.json \ 
    	-v keycloakData:/opt/keycloak/data \ 
    	quay.io/keycloak/keycloak:20.0.1 start --import-realm --hostname "$KC_HOSTNAME" 

Results

This creates an instance similar to the default built-in Keycloak instance. When you run the command, it performs the following actions:
  • It creates a Docker container on port 8080 for HTTP and 8443 for HTTPS.
  • It sets default admin credentials.
  • It imports the ldc-realm.json file contents into the Keycloak container, and then Keycloak reads the content and automatically generates those roles.
  • It creates Keycloak data in a folder to store the data linked to the Keycloak container.

Update custom values

Once you configure Keycloak, you must update custom values in the custom-values.yaml file. Perform the following steps to update custom values for Keycloak:

Procedure

  1. Disable the built-in Keycloak instance by updating the enabled parameter to false.

    ... 
    keycloak: 
      enabled: false 
    ... 
  2. Specify the authServerUrl for Keycloak in the app-server and rest-server sections.

    NoteUse http for basic Keycloak over HTTP and https based on the secure Keycloak over HTTPS configuration.
    ... 
    app-server: 
      keycloak: 
        authServerUrl: "https://<KEYCLOAK_URL>:<PORT>" 
    ... 
    rest-server: 
      keycloak: 
        authServerUrl: "https://<KEYCLOAK_URL>:<PORT>/realms/ldc-realm" 
    ... 
    The following table provides the details about the parameters or placeholders used in the code snippet.
    Parameter (Placeholder)Description
    <KEYCLOAK_URL>Data Catalog app server URL
    <PORT>The defined port number
  3. If you encounter any issues due to using a self-signed SSL certificate with the Keycloak deployment, update the app-server and rest-server parameters in the custom-values.yaml file to accept untrusted certificates as shown in the following code sample.

    ... 
    app-server: 
      untrustedCertsPolicy: ALLOW 
    ... 
    rest-server: 
      oidc: 
        tls: 
          verification: none 
      okhttp: 
        trustUnknownCerts: true 
    ... 
  4. Update the Helm chart to apply changes.

    helm upgrade -i <app name> -n <namespace> -f <custom values file> 

Results

The external Keycloak instance is successfully configured with Data Catalog. You can log in to Data Catalog and make sure the URL points to the external Keycloak instance.

Configuring Data Catalog to use an SSL certificate

Configure an SSL certificate to initiate secure browser sessions. You can either configure the Certificate Authority (CA) signed or a self-signed SSL certificate.

Before you begin, you need the following to configure Data Catalog to use an SSL certificate:

  • A .pfx (Personal Information Exchange) file from a trusted Certificate Authority. The .pfx file contains a public key certificate and the corresponding private key.
  • A host with Kubernetes installed and Data Catalog prerequisites fulfilled, and a Kubernetes namespace for installing Data Catalog and the SSL certificate.
    NoteFor an existing Data Catalog installation, run the following command to find its namespace: helm list -A | grep ldc | awk '{print $2}'

To create your own self-signed certificate, see Create a Self-Signed SSL Certificate.

Create a self-signed SSL certificate

Before you begin

OpenSSL must be installed to create a self-signed SSL certificate.

Perform the following steps to create a self-signed SSL certificate:

Procedure

  1. Create an OpenSSL configuration file:

    In the following example, replace www.my-ldc-instance.com with your domain name.
    [req]
    distinguished_name = req_distinguished_name
    x509_extensions = v3_req
    prompt = no
    [req_distinguished_name]
    CN = my-ldc-instance.com                          
    [v3_req]
    keyUsage = digitalSignature, keyEncipherment
    extendedKeyUsage = serverAuth
    subjectAltName = @alt_names
    [alt_names]
    DNS.1 = my-ldc-instance.com                       
    DNS.2 = www.my-ldc-instance.com
    The configuration file is created for an instance hosted at my-ldc-instance.com.
  2. Save the configuration file as ldc_cert.conf.

  3. Create ldc_key.pem and ldc_cert.pem files, which are required to apply the self-signed certificate to Data Catalog.

    openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
        -keyout ldc_key.pem -out ldc_cert.pem \
    -config ldc_cert.conf -extensions 'v3_req'
    
    In the example above,days 365 means the self-signed certificate is valid for 365 days.ldc_cert.pem is the self-signed certificate file in the PEM format.ldc_key.pem is a private key file in the PEM format.

Apply an SSL certificate to Data Catalog

To apply an SSL certificate to Data Catalog, you need to create a Kubernetes secret and apply it to the Data Catalog configuration.

Create a Kubernetes Secret

Secrets are an essential part of managing sensitive data in a Kubernetes cluster, and they help to ensure the security and integrity of your applications. The Secret object provides a way to manage sensitive data within a Kubernetes cluster, such as SSL certificates. Instead of hard coding this information in the application manifest, which could be a security risk, you can use a Secret object to store the data separately and then reference it in the application manifest

Before you begin

If you are usingThen you must have
Certificate Authority (CA) signed certificateA certificate .crt file and a private key .key file
Self-signed SSL certificateA certificate ldc_cert.pem file and a private key ldc_key.pem file
Perform the following steps to create a Kubernetes secret:

Procedure

  1. Verify you have the correct files as mentioned in before you begin.

  2. Create a Kubernetes secret:

    kubectl create secret tls ldc-tls-secret \
    
        --namespace <namespace> \        
        --key <private key file> \        
        --cert <certificate file> 
    
    In the example above, <namespace> the secret is created in this namespace, and this namespace is used to install the application. Data Catalog.<private key file> is the path to the private key file.<certificate file> is the path to the certificate file.

Results

The Kubernetes secret ldc-tls-secret is created with the certificate and private key containing the Base64-encoded contents of the original files.

Add Kubernetes secret to the Data Catalog configuration

Use the secret in Data Catalog by specifying it in the Ingress configuration custom_values.yaml file. For more information on configuring Ingress, see Exposing services using an Ingress controller - For production environments.

Perform the following steps to add Kubernetes secret to the Data Catalog configuration:

Procedure

  1. Navigate to the Ingress subsection of the YAML file and add the secret in the Ingress subsection, as shown in the following example.

    Replace www.my-ldc-instance.com with your domain name.
    ...
    app-server:
      ingress:
        enabled: true
        hosts:
        - host: my-ldc-instance.com
          paths:
            - path: /
              pathType: Prefix
        tls:
        - hosts:
          - "my-ldc-instance.com"
          secretName: ldc-tls-secret
    ...
    
  2. For a new Data Catalog installation or for updrading an existing Data Catalog instance use the command `helm upgrade -i <options>`

    For example,
    helm upgrade -i --wait ldc7 ldc-7.0.1.tgz -f custom-values.yml -n ldc
    For more information on parameters required to upgrade Data Catalog, see Data Catalog upgrade paths.
  3. Open Data Catalog in the web browser, it should return the certificate.

    GUID-667985F3-A40F-4164-A4DE-3A9F9838AE85-low.png
    NoteThe Data Catalog will return the certificate only when you access Data Catalog via the web browser and not by NodePort port number.

Configure Keycloak and the REST Server to use an SSL certificate

When configuring Keycloak and the REST Server to use an SSL certificate, there can be two scenarios: A single DNS where Keycloak and REST Server are on different paths or multiple DNS records where you have unique subdomains for Keycloak and the REST Server. An example of each is shown below.

Single DNS Record

The following is an example for a single DNS record:

Specify the following it in the Ingress configuration custom_values.yaml file. For more information on configuring Ingress, see Exposing services using an Ingress controller - For production environments.

Replace www.my-ldc-instance.com with your domain name.

keycloak:
  ingress:
    enabled: true
    hosts:
    # www.keycloak.org/server/reverseproxy#_exposing_the_administration_console:
    - host: my-ldc-instance.com
      paths:
      - path: /realms
        pathType: Prefix
      - path: /resources
        pathType: Prefix
      - path: /js
        pathType: Prefix
    tls:
    - hosts:
      - "my-ldc-instance.com"
      secretName: ldc-tls-secret
...
app-server:
  keycloak:
    authServerUrl: "https://my-ldc-instance.com"
  untrustedCertsPolicy: ALLOW
...
rest-server:
  keycloak:
    authServerUrl: "https://my-ldc-instance.com/realms/ldc-realm"
  oidc:
    tls:
      verification: none
  okhttp:
    trustUnknownCerts: true
  ingress:
    enabled: true
    hosts:
    - host: my-ldc-instance.com
      paths:
        - path: /api/v1
          pathType: Prefix
        - path: /swagger-ui
          pathType: Prefix
        - path: /api-docs
          pathType: Prefix
    tls:
    - hosts:
      - "my-ldc-instance.com"
      secretName: ldc-tls-secret
Multiple DNS Records

The following is an example for multiple DNS records.

Specify the following it in the Ingress configuration custom_values.yaml file. For more information on configuring Ingress, see Exposing services using an Ingress controller - For production environments.

NoteTo configure the SSL certificate on multiple DNS records, you must have an SSL certificate with a wildcard for subdomains.

Replace www.my-ldc-instance.com with your domain name.

keycloak:
  ingress:
    enabled: true
    hosts:
    # www.keycloak.org/server/reverseproxy#_exposing_the_administration_console:
    - host: idp.my-ldc-instance.com
      paths:
      - path: /
        pathType: Prefix
    tls:
    - hosts:
      - "idp.my-ldc-instance.com"
      secretName: ldc-tls-secret
...
app-server:
  keycloak:
    authServerUrl: "https://idp.my-ldc-instance.com"
  untrustedCertsPolicy: ALLOW
...
rest-server:
  keycloak:
    authServerUrl: "https://idp.my-ldc-instance.com/realms/ldc-realm"
  oidc:
    tls:
      verification: none
  okhttp:
    trustUnknownCerts: true
  ingress:
    enabled: true
    hosts:
    - host: api.my-ldc-instance.com
      paths:
        - path: /
          pathType: Prefix
    tls:
    - hosts:
      - "api.my-ldc-instance.com"
      secretName: ldc-tls-secret

Update an existing SSL certificate

To ensure that your SSL certificates are up to date and valid, you must regularly renew them and update the corresponding secret

Procedure

  1. Delete anthe existing Kubernetes secret.

    kubectl delete secret <secret-name> --namespace <namespace>
    The <namespace> is the namespace used to create the secret and install LDC. If you have forgotten the name of your secret, use the following command to search for Kubernetes secret:
    kubectl get secret -n <namespace> --field-selector type=kubernetes.io/tls --selector='app
    .kubernetes.io/instance!=ldc'
    
  2. Create the secret. Creating the new secret will update the SSL certificate used by Data Catalog.

    kubectl create secret tls ldc-tls-secret \
        --namespace <namespace> \          
        --key <private key file> \        
        --cert <certificate file>
    In the example above, <namespace> the secret is created in this namespace, and this namespace is used to install the application. Data Catalog.<private key file> is the path to the private key file. <certificate file> is the path to the certificate file
  3. To confirm the changes, open Data Catalog in the web browser and verify that it returns the new certificate

Configuring an external MongoDB

MongoDB is a document database that manages document-oriented information and stores and retrieves data. Lumada Data Catalog uses MongoDB as a repository to store the metadata collected from processing functions. Typically, it resides with the application server in a centralized location. See the Architecture section in the Product overview for more details.

This section depicts the steps to connect an external MongoDB instance to Data Catalog. Before you begin, make sure you have the following items ready before proceeding with the external MongoDB setup with Data Catalog.

  • A host server capable of running Data Catalog.
  • Two or more servers capable of running MongoDB.

Set up a MongoDB replica set

Data Catalog requires a replica set, not a simple MongoDB server. This section helps you set up a MongoDB replica set on CentOS. You can move to the Create a user topic if you have already set up a MongoDB replica set.

NoteYou may skip or modify some steps based on the environment used.
This example assumes that you have a set of three CentOS instances with the URLs similar to the following:
  • mongo-1.hitachivantara.com
  • mongo-2.hitachivantara.com
  • mongo-3.hitachivantara.com
NoteYou must substitute the above URLs with your custom URLs when following instructions.

Install MongoDB on all instances

Follow the steps below and install MongoDB 5.x on each CentOS 7 instance:

Procedure

  1. Create a file and add the MongoDB repository to the yum file.

    sudo vi /etc/yum.repos.d/mongodb-org.repo
  2. Add the following content and save the file.

    [mongodb-org-5.0]
    name=MongoDB Repository
    baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/5.0/x86_64/
    gpgcheck=1
    enabled=1
    gpgkey=https://www.mongodb.org/static/pgp/server-5.0.asc
    
  3. Install MongoDB.

    sudo yum install mongodb-org
  4. Start the MongoDB service.

    sudo systemctl start mongod
  5. Repeat the above steps instances on each instance.

Results

Now you have installed MongoDB on all instances.

Configure the replica set

Follow the steps below to install MongoDB on each CentOS 7 instance:
ImportantRepeat the following steps on every server you want to add to the replica set.

Procedure

  1. Open the MongoDB configuration file.

    sudo nano /etc/mongod.conf
  2. Modify the network interfaces section to bind MongoDB to your URL.

    You can access MongoDB internally with the IP address and externally using the URL.
    # network interfaces
    net:
      port: 27017
      bindIp: 127.0.0.1,mongo-1.hitachivantara.com # Use the correct URL for each server!
    
  3. In the replication section, add the name for your replica set.

    The name you choose here will be the name for your Replica Set. In the example below, it is rs0.
    replication:
      replSetName: "rs0"
    
  4. Save and close the file.

  5. Restart the MongoDB service.

    sudo systemctl restart mongod
  6. Repeat the steps from Step 1 through Step 5 for the other two instances.

  7. Now, you can create the replica set itself. Log in to any instances with MongoDB installed and start up the MongoDB CLI tool mongosh (recommended) or mongo.

  8. Run the following command, where _id is the name of your replica set, and members contain a list of the hosts you have set up in the above steps.

    rs.initiate({
    ... _id: "rs0",
    ... members: [
    ... { _id: 0, host: "mongo-1.hitachivantara.com" },
    ... { _id: 1, host: "mongo-2.hitachivantara.com" },
    ... { _id: 2, host: "mongo-3.hitachivantara.com" }
    ... ]
    ... })
    

Results

The MongoDB replica set is ready to use.

Create a user for external MongoDB

To create a Data Catalog user, follow the procedure:

Procedure

  1. Log in to MongoDB by running mongosh on the virtual machines you are using to host MongoDB.

  2. To create a new user in the admin database, run the following command.

    use admin;
    db.createUser({ user: "ldcuser", pwd: passwordPrompt(), roles: [{ role: "dbAdmin", db: "ldcdb" }] });
    

Results

This command creates a new user with the username ldcuser, a user-provided password, and the dbAdmin role for the ldcdb database that Data Catalog uses.
NoteYou make a note of the username and password as you may need in the connection string to connect to MongoDB.

Create a MongoDB connection string

Perform the following steps to create a connection string for Data Catalog and MongoDB.

Procedure

  1. Create a connection string.

    The following example is a sample of connection string.
    mongodb://<USER>:<PASS>@<HOST(S)>:<PORT>/ldcdb?authSource=admin&replicaSet=<REPLICA_SET>
    - or -
    mongodb+srv://<USER>:<PASS>@<HOST(S)>:<PORT>/ldcdb
    
    ParameterDescription
    USERThe username for the MongoDB user that is created for Data Catalog.
    PASSThe password for the MongoDB user that is created for Data Catalog.
    HOSTA list of the hosts that make up the replica set.
    REPLICA_SETThe ID of the replica set that the hosts belong to; is passed as _id when creating with rs.initiate() and stored in mongod.conf as replSetName.
    +srvA DNS seed list connection string. See Connection String URI Format for more information.
  2. Create a Secret using the following command.

    NoteIt is the best practice to store the MongoDB connection string as a Kubernetes Secret.
    kubectl create secret -n <NAMESPACE> generic <SECRET_NAME> --from-literal=mongodbURI="<YOUR_MONGODB_CONNECTION_STRING>"

Results

You have successfully created a Kubernetes Secrete using a connection string to connect with Data Catalog.

Update custom values for an external MongoDB

Perform the following steps to update the custom_values.yaml file in Data Catalog.

Procedure

  1. Open the custom_values.yaml file.

  2. Disable the built-in MongoDB database since it is not required.

    ...
    mongodb:
      enabled: false
    ...
    
  3. Update the app-server and rest-server with the Kubernetes Secret.

    ...
    app-server:
      mongodbURISecret: <SECRET_NAME>
    ...
    rest-server:
      mongodbURISecret: <SECRET_NAME>
    ...
    
  4. Continue with Data Catalog setup.

Results

You have successfully set up an external MongoDB with Data Catalog.