Advanced configuration
After installing Data Catalog, there may be other components you need to set up, depending on your environment. Use the following topics as needed, to finish setting up your environment.
External Keycloak instance
In Data Catalog, an external Keycloak instance provides a secure and centralized authentication mechanism for user access. It also supports a wide range of authentication protocols and standards. Using an external Keycloak instance also provides additional benefits such as centralized management of user identities and access control policies, which simplifies user administration and reduces the risk of security vulnerabilities. For more information, see Configuring external Keycloak.
SSL certificate
In Data Catalog, you can use an SSL (Secure Sockets Layer) certificate to ensure secure network communications. In SSL communications, communication routes are encrypted to prevent information leakage and detect data manipulation during transfer. For more information, see Configuring Data Catalog to use an SSL certificate.
External MongoDB
MongoDB is a document database that manages document-oriented information and stores and retrieves data. Data Catalog uses MongoDB as a repository to store the metadata collected from processing functions. For more information, see Configuring an external MongoDB.
Configuring external Keycloak
Keycloak is an open-source identity and access management (IAM) solution. Lumada Data Catalog uses Keycloak to provide authentication and authorization services that authenticate users, manage user roles and permissions, and control access to the Data Catalog resources.
This article depicts setting up and configuring an external Keycloak instance for Data Catalog.
Before you begin, you should have the following items ready to set up the external Keycloak instance:
- An instance of Keycloak, set up by an experienced user
If you don’t have a Keycloak instance, go to Set up a Keycloak instance.
NoteNew Keycloak instance setup requires that Docker software is installed. - A host capable of running Data Catalog
- (Optional) SSL certificates for securing Keycloak
Set up a Keycloak instance
This topic depicts how to set up a Keycloak instance.
Make sure you have the ldc-realm.json
file ready. You can extract the ldc-realm.json
file from the Data Catalog Helm chart.
tar -axf ./ldc-7.3.0.tgz ldc/charts/keycloak/files/ldc-realm.json -O > ldc-realm.json
ldc-realm.json
file.You can deploy Keycloak with two methods:
Set up basic Keycloak over HTTP
For basic Keycloak over HTTP, you need to set up the Keycloak via Docker (recommended by Keycloak).
To set up basic Keycloak over HTTP, run the following command.
docker run -d --name keycloak -p 8080:8080 -p 8443:8443 \ -e KEYCLOAK_ADMIN=admin -e KEYCLOAK_ADMIN_PASSWORD=admin \ -v $PWD/ldc-realm.json:/opt/keycloak/data/import/ldc-realm.json \ -v keycloakData:/opt/keycloak/data \ quay.io/keycloak/keycloak:20.0.1 start-dev --import-realm
Set up secure Keycloak over HTTPS
You may use certificates from a trusted certificate authority or generate your own self-signed certificates (see Configuring Data Catalog to use an SSL certificate for more information). Either way, you should have a certificate and key files.
For deploying a secure Keycloak instance over HTTPS, you need to set up Keycloak using Docker.
Procedure
Get the
hostname
through which you access your Keycloak instance, which must match your certificate.Run the following command:
KC_HOSTNAME="<keycloak-hostname>" # <-- replace with your Keycloak hostname !!! docker run -d --name keycloak -p 8080:8080 -p 8443:8443 \ -e KEYCLOAK_ADMIN=admin -e KEYCLOAK_ADMIN_PASSWORD=admin \ -e KC_HTTPS_CERTIFICATE_FILE=/opt/keycloak/conf/server.crt.pem \ -e KC_HTTPS_CERTIFICATE_KEY_FILE=/opt/keycloak/conf/server.key.pem \ -v $PWD/keycloak_cert.pem:/opt/keycloak/conf/server.crt.pem \ -v $PWD/keycloak_key.pem:/opt/keycloak/conf/server.key.pem \ -v $PWD/ldc-realm.json:/opt/keycloak/data/import/ldc-realm.json \ -v keycloakData:/opt/keycloak/data \ quay.io/keycloak/keycloak:20.0.1 start --import-realm --hostname "$KC_HOSTNAME"
Results
- It creates a Docker container on port 8080 for HTTP and 8443 for HTTPS.
- It sets default admin credentials.
- It imports the
ldc-realm.json
file contents into the Keycloak container, and then Keycloak reads the content and automatically generates those roles. - It creates Keycloak data in a folder to store the data linked to the Keycloak container.
Update custom values
custom-values.yaml
file. Perform the following steps to update custom values for Keycloak:Procedure
Disable the built-in Keycloak instance by updating the
enabled
parameter tofalse
.... keycloak: enabled: false ...
Specify the
authServerUrl
for Keycloak in theapp-server
andrest-server
sections.NoteUsehttp
for basic Keycloak over HTTP andhttps
based on the secure Keycloak over HTTPS configuration.... app-server: keycloak: authServerUrl: "https://<KEYCLOAK_URL>:<PORT>" ... rest-server: keycloak: authServerUrl: "https://<KEYCLOAK_URL>:<PORT>/realms/ldc-realm" ...
The following table provides the details about the parameters or placeholders used in the code snippet.Parameter (Placeholder) Description <KEYCLOAK_URL>
Data Catalog app server URL <PORT>
The defined port number If you encounter any issues due to using a self-signed SSL certificate with the Keycloak deployment, update the
app-server
andrest-server
parameters in thecustom-values.yaml
file to accept untrusted certificates as shown in the following code sample.... app-server: untrustedCertsPolicy: ALLOW ... rest-server: oidc: tls: verification: none okhttp: trustUnknownCerts: true ...
Update the Helm chart to apply changes.
helm upgrade -i <app name> -n <namespace> -f <custom values file>
Results
Configuring Data Catalog to use an SSL certificate
Configure an SSL certificate to initiate secure browser sessions. You can either configure the Certificate Authority (CA) signed or a self-signed SSL certificate.
Before you begin, you need the following to configure Data Catalog to use an SSL certificate:
- A
.pfx
(Personal Information Exchange) file from a trusted Certificate Authority. The.pfx
file contains a public key certificate and the corresponding private key. - A host with Kubernetes installed and Data Catalog prerequisites fulfilled, and a Kubernetes namespace
for installing Data Catalog and the SSL certificate.NoteFor an existing Data Catalog installation, run the following command to find its namespace:
helm list -A | grep ldc | awk '{print $2}'
To create your own self-signed certificate, see Create a Self-Signed SSL Certificate.
Create a self-signed SSL certificate
Before you begin
Perform the following steps to create a self-signed SSL certificate:
Procedure
Create an OpenSSL configuration file:
In the following example, replacewww.my-ldc-instance.com
with your domain name.[req] distinguished_name = req_distinguished_name x509_extensions = v3_req prompt = no [req_distinguished_name] CN = my-ldc-instance.com [v3_req] keyUsage = digitalSignature, keyEncipherment extendedKeyUsage = serverAuth subjectAltName = @alt_names [alt_names] DNS.1 = my-ldc-instance.com DNS.2 = www.my-ldc-instance.com
The configuration file is created for an instance hosted atmy-ldc-instance.com
.Save the configuration file as
ldc_cert.conf
.Create
ldc_key.pem
andldc_cert.pem
files, which are required to apply the self-signed certificate to Data Catalog.openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout ldc_key.pem -out ldc_cert.pem \ -config ldc_cert.conf -extensions 'v3_req'
In the example above,days 365
means the self-signed certificate is valid for 365 days.ldc_cert.pem
is the self-signed certificate file in the PEM format.ldc_key.pem
is a private key file in the PEM format.
Apply an SSL certificate to Data Catalog
To apply an SSL certificate to Data Catalog, you need to create a Kubernetes secret and apply it to the Data Catalog configuration.
Create a Kubernetes Secret
Before you begin
If you are using | Then you must have |
Certificate Authority (CA) signed certificate | A certificate .crt file and a private key
.key file |
Self-signed SSL certificate | A certificate ldc_cert.pem file and a private
key ldc_key.pem file |
Procedure
Verify you have the correct files as mentioned in before you begin.
Create a Kubernetes secret:
kubectl create secret tls ldc-tls-secret \ --namespace <namespace> \ --key <private key file> \ --cert <certificate file>
In the example above,<namespace>
the secret is created in this namespace, and this namespace is used to install the application. Data Catalog.<private key file>
is the path to the private key file.<certificate file>
is the path to the certificate file.
Results
ldc-tls-secret
is
created with the certificate and private key containing the Base64-encoded contents of
the original files.Add Kubernetes secret to the Data Catalog configuration
Perform the following steps to add Kubernetes secret to the Data Catalog configuration:
Procedure
Navigate to the Ingress subsection of the YAML file and add the secret in the Ingress subsection, as shown in the following example.
Replacewww.my-ldc-instance.com
with your domain name.... app-server: ingress: enabled: true hosts: - host: my-ldc-instance.com paths: - path: / pathType: Prefix tls: - hosts: - "my-ldc-instance.com" secretName: ldc-tls-secret ...
For a new Data Catalog installation or for updrading an existing Data Catalog instance use the command
For example,`helm upgrade -i <options>`
helm upgrade -i --wait ldc7 ldc-7.0.1.tgz -f custom-values.yml -n ldc
For more information on parameters required to upgrade Data Catalog, see Data Catalog upgrade paths.Open Data Catalog in the web browser, it should return the certificate.
NoteThe Data Catalog will return the certificate only when you access Data Catalog via the web browser and not by NodePort port number.
Configure Keycloak and the REST Server to use an SSL certificate
When configuring Keycloak and the REST Server to use an SSL certificate, there can be two scenarios: A single DNS where Keycloak and REST Server are on different paths or multiple DNS records where you have unique subdomains for Keycloak and the REST Server. An example of each is shown below.
The following is an example for a single DNS record:
Specify the following it in the Ingress configuration custom_values.yaml file. For more information on configuring Ingress, see Exposing services using an Ingress controller - For production environments.
Replace www.my-ldc-instance.com
with your domain
name.
keycloak: ingress: enabled: true hosts: # www.keycloak.org/server/reverseproxy#_exposing_the_administration_console: - host: my-ldc-instance.com paths: - path: /realms pathType: Prefix - path: /resources pathType: Prefix - path: /js pathType: Prefix tls: - hosts: - "my-ldc-instance.com" secretName: ldc-tls-secret ... app-server: keycloak: authServerUrl: "https://my-ldc-instance.com" untrustedCertsPolicy: ALLOW ... rest-server: keycloak: authServerUrl: "https://my-ldc-instance.com/realms/ldc-realm" oidc: tls: verification: none okhttp: trustUnknownCerts: true ingress: enabled: true hosts: - host: my-ldc-instance.com paths: - path: /api/v1 pathType: Prefix - path: /swagger-ui pathType: Prefix - path: /api-docs pathType: Prefix tls: - hosts: - "my-ldc-instance.com" secretName: ldc-tls-secret
The following is an example for multiple DNS records.
Specify the following it in the Ingress configuration custom_values.yaml file. For more information on configuring Ingress, see Exposing services using an Ingress controller - For production environments.
Replace www.my-ldc-instance.com
with your domain
name.
keycloak: ingress: enabled: true hosts: # www.keycloak.org/server/reverseproxy#_exposing_the_administration_console: - host: idp.my-ldc-instance.com paths: - path: / pathType: Prefix tls: - hosts: - "idp.my-ldc-instance.com" secretName: ldc-tls-secret ... app-server: keycloak: authServerUrl: "https://idp.my-ldc-instance.com" untrustedCertsPolicy: ALLOW ... rest-server: keycloak: authServerUrl: "https://idp.my-ldc-instance.com/realms/ldc-realm" oidc: tls: verification: none okhttp: trustUnknownCerts: true ingress: enabled: true hosts: - host: api.my-ldc-instance.com paths: - path: / pathType: Prefix tls: - hosts: - "api.my-ldc-instance.com" secretName: ldc-tls-secret
Update an existing SSL certificate
To ensure that your SSL certificates are up to date and valid, you must regularly renew them and update the corresponding secret
Procedure
Delete anthe existing Kubernetes secret.
kubectl delete secret <secret-name> --namespace <namespace>
The <namespace> is the namespace used to create the secret and install LDC. If you have forgotten the name of your secret, use the following command to search for Kubernetes secret:kubectl get secret -n <namespace> --field-selector type=kubernetes.io/tls --selector='app .kubernetes.io/instance!=ldc'
Create the secret. Creating the new secret will update the SSL certificate used by Data Catalog.
kubectl create secret tls ldc-tls-secret \ --namespace <namespace> \ --key <private key file> \ --cert <certificate file>
In the example above,<namespace>
the secret is created in this namespace, and this namespace is used to install the application. Data Catalog.<private key file>
is the path to the private key file.<certificate file>
is the path to the certificate fileTo confirm the changes, open Data Catalog in the web browser and verify that it returns the new certificate
Configuring an external MongoDB
MongoDB is a document database that manages document-oriented information and stores and retrieves data. Lumada Data Catalog uses MongoDB as a repository to store the metadata collected from processing functions. Typically, it resides with the application server in a centralized location. See the Architecture section in the Product overview for more details.
This section depicts the steps to connect an external MongoDB instance to Data Catalog. Before you begin, make sure you have the following items ready before proceeding with the external MongoDB setup with Data Catalog.
- A host server capable of running Data Catalog.
- Two or more servers capable of running MongoDB.
Set up a MongoDB replica set
Data Catalog requires a replica set, not a simple MongoDB server. This section helps you set up a MongoDB replica set on CentOS. You can move to the Create a user topic if you have already set up a MongoDB replica set.
- mongo-1.hitachivantara.com
- mongo-2.hitachivantara.com
- mongo-3.hitachivantara.com
Install MongoDB on all instances
Procedure
Create a file and add the MongoDB repository to the
yum
file.sudo vi /etc/yum.repos.d/mongodb-org.repo
Add the following content and save the file.
[mongodb-org-5.0] name=MongoDB Repository baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/5.0/x86_64/ gpgcheck=1 enabled=1 gpgkey=https://www.mongodb.org/static/pgp/server-5.0.asc
Install MongoDB.
sudo yum install mongodb-org
Start the MongoDB service.
sudo systemctl start mongod
Repeat the above steps instances on each instance.
Results
Configure the replica set
Procedure
Open the MongoDB configuration file.
sudo nano /etc/mongod.conf
Modify the network interfaces section to bind MongoDB to your URL.
You can access MongoDB internally with the IP address and externally using the URL.# network interfaces net: port: 27017 bindIp: 127.0.0.1,mongo-1.hitachivantara.com # Use the correct URL for each server!
In the
The name you choose here will be the name for your Replica Set. In the example below, it isreplication
section, add the name for your replica set.rs0
.replication: replSetName: "rs0"
Save and close the file.
Restart the MongoDB service.
sudo systemctl restart mongod
Repeat the steps from Step 1 through Step 5 for the other two instances.
Now, you can create the replica set itself. Log in to any instances with MongoDB installed and start up the MongoDB CLI tool mongosh (recommended) or mongo.
Run the following command, where
_id
is the name of your replica set, andmembers
contain a list of the hosts you have set up in the above steps.rs.initiate({ ... _id: "rs0", ... members: [ ... { _id: 0, host: "mongo-1.hitachivantara.com" }, ... { _id: 1, host: "mongo-2.hitachivantara.com" }, ... { _id: 2, host: "mongo-3.hitachivantara.com" } ... ] ... })
Results
Create a user for external MongoDB
Procedure
Log in to MongoDB by running mongosh on the virtual machines you are using to host MongoDB.
To create a new user in the admin database, run the following command.
use admin; db.createUser({ user: "ldcuser", pwd: passwordPrompt(), roles: [{ role: "dbAdmin", db: "ldcdb" }] });
Results
ldcuser
, a user-provided password, and the dbAdmin
role for the ldcdb
database that Data Catalog uses.Create a MongoDB connection string
Procedure
Create a connection string.
The following example is a sample of connection string.mongodb://<USER>:<PASS>@<HOST(S)>:<PORT>/ldcdb?authSource=admin&replicaSet=<REPLICA_SET> - or - mongodb+srv://<USER>:<PASS>@<HOST(S)>:<PORT>/ldcdb
Parameter Description USER
The username for the MongoDB user that is created for Data Catalog. PASS
The password for the MongoDB user that is created for Data Catalog. HOST
A list of the hosts that make up the replica set. REPLICA_SET
The ID of the replica set that the hosts belong to; is passed as _id
when creating withrs.initiate()
and stored inmongod.conf
asreplSetName
.+srv
A DNS seed list connection string. See Connection String URI Format for more information. Create a Secret using the following command.
NoteIt is the best practice to store the MongoDB connection string as a Kubernetes Secret.kubectl create secret -n <NAMESPACE> generic <SECRET_NAME> --from-literal=mongodbURI="<YOUR_MONGODB_CONNECTION_STRING>"
Results
Update custom values for an external MongoDB
custom_values.yaml
file in Data Catalog.Procedure
Open the
custom_values.yaml
file.Disable the built-in MongoDB database since it is not required.
... mongodb: enabled: false ...
Update the
app-server
andrest-server
with the Kubernetes Secret.... app-server: mongodbURISecret: <SECRET_NAME> ... rest-server: mongodbURISecret: <SECRET_NAME> ...
Continue with Data Catalog setup.
Results