Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Back up and restore

Parent article

Use the following topics to back up and restore Lumada Data Catalog components.

Before you begin back up and restore

Make sure you have the following software and access before you start to back up or restore Data Catalog components:

  • MongoDB

    MongoDB tools, which include the utilities mongodumps and mongorestore. You can find more information on installing MongoDB tools on various platforms on the official MongoDB website: https://www.mongodb.com/docs/database-tools/installation/installation/

  • Large Properties directory

    In Data Catalog, a large properties directory is used to store sample data and fingerprints of profiled data, so you must have access to the directory to be able to back up or restore this data.

    • Access to your object store user interface.
    • Relevant permissions to copy and synchronize index files on your object store.

Backing up Data Catalog

As a best practice before you upgrade to a new software version, back up the following Data Catalog components:

  • Keystore
  • MongoDB
  • Large Properties
  • Keycloak
Backing up ensures that information on any data, jobs, and indexes you produce is retained. If an upgrade fails and you must perform a rollback or re-install, you can use the backups to restore your Data Catalog storage.

Back up the keystore

You can back up the keystore by saving the Kubernetes secret’s configuration into a YAML file. The keystore is where Data Catalog stores keys for sensitive information encryption. It is stored on the cluster as a Kubernetes secret.

Use the following steps to back up the keystore:

Procedure

  1. Find the release name and namespace in the Helm chart you used during the install of Data Catalog.

  2. Use a command like the example shown below to save the keystore secret to keystore-secret.yaml:

    kubectl get secret <release name>-app-server-keystore -o yaml -n <namespace> > keystore-secret.yaml

    The <release name> and <namespace> variables are the release name and namespace from the Helm chart.

Back up MongoDB

By default, MongoDB data for Data Catalog is stored in a database called ldcdb. You can use the mongodump utility to take binary exports (dumps) of the ldcdb database and save them in a location of your choice. The mongodump command requires the full URI to the MongoDB instance and a full path to where the dumps will be stored.

Use the following steps to back up MongoDB:

Procedure

  1. Determine the following information for your Data Catalog installation:

    • MongoDB username
    • MongoDB password
    • Full URI to your MongoDB instance
  2. Use mongodump command to export the binaries of the ldcdb database, as shown in the following example:

    mongodump --uri=mongodb://<username>:<password>@<DNS or IP of server>:<mongoDB port>/ldcdb?authSource=admin --out=/tmp/dumps/

    In this example, dumps are created for a MongoDB instance and stored in the /tmp/dumps folder:

Results

This command creates MongoDB dumps in /tmp/dumps, which contains a BSON and JSON file for each collection in the ldcdb database.

Next steps

For more information, see the MongoDB backup documentation at https://www.mongodb.com/docs/v5.0/core/backups/.

Backing up large properties

A large properties file is used to store sample data and fingerprints. By default, large properties are stored on the root level in a bucket called ldc-discovery-cache. The easiest way to back up your large properties files is to log into the user interface for your object store and duplicate the large properties directory. The steps to do this will vary depending on what you are using as your object store. Use one of the following references for object store replication to duplicate your large properties directory:

Back up Keycloak

Depending on the version and setup of Keycloak that you have, the script used to export data may differ. In the steps below, the kc.sh script is used. For more information, see https://www.keycloak.org/server/importExport.

For Keycloak setups that leverage the standalone.sh script, see https://support.kublr.com/support/solutions/articles/33000261942-kcp-keycloak-data-backup-and-restore.

The steps below cover backing up data on the Keycloak pod included in the Data Catalog Helm chart. This Keycloak setup includes data for two realms: master and ldc-realm.

NoteThe steps below are an example only, and you should customize the steps to your environment.

Procedure

  1. Get the name of the Keycloak pod.

    kubectl get pod -n <namespace>
  2. Prepare the export folder inside the Keycloak pod.

    kubectl exec -n <namespace> <keycloak pod> -c keycloak -- sh -c 'rm -rf /tmp/keycloak-export'
  3. Run the export command using the kc.sh script.

    kubectl exec -n <namespace> <keycloak pod> -c keycloak -it -- /opt/keycloak/bin/kc.sh export --dir /tmp/keycloak-exportA successful export produces logs similar to the following:

    [org.keycloak.exportimport.dir.DirExportProvider] (main) Exporting into directory /tmp/keycloak-export
    [org.keycloak.exportimport.dir.DirExportProvider] (main) KC-SERVICES0033: Full model export requested
    [org.keycloak.exportimport.dir.DirExportProvider] (main) Realm 'master' - data exported
    [org.keycloak.exportimport.dir.DirExportProvider] (main) Users 0-1 exported
    [org.keycloak.exportimport.dir.DirExportProvider] (main) Realm 'ldc-realm' - data exported
    [org.keycloak.exportimport.dir.DirExportProvider] (main) Users 0-4 exported
    [org.keycloak.exportimport.dir.DirExportProvider] (main) KC-SERVICES0035: Export finished successfully
    

    The following files are created in the /tmp/keycloak-export directory:

    • ldc-realm-realm.json
    • ldc-realm-users-0.json
    • master-realm.json
    • master-users-0.json
  4. Use the tar command to archive the /tmp/keycloak-export folder and copy the TGZ file into an existing local folder. In this example, the local folder is called keycloak-export-local.

    1. kubectl exec -n <namespace> <keycloak pod> -c keycloak -- sh -c 'cd /tmp/keycloak-export && tar -c -f keycloak-export.tgz *'

    2. kubectl exec -n <namespace> <keycloak pod> -c keycloak -- cat /tmp/keycloak-export/keycloak-export.tgz > ~/keycloak-export-local/keycloak-export.tgz

    If the tar command is not available, you can use the cat command to copy each file into your local environment. The example below copies one of the JSON files to the keycloak-export-local folder:

    kubectl exec -n <namespace> <keycloak pod> -- cat /tmp/keycloak-export/master-users-0.json > ~/keycloak-export-local/master-users-0.json

Restoring Data Catalog

Use the following sections to restore your backed up Data Catalog components:

  1. Restore the keystore.
    NoteIf you do not restore the keystore before installing Data Catalog, the Lumada Data Catalog Agent won't be able to connect to the Lumada Data Catalog Application Server, and the LDC Application Server will have the status Disconnected.
  2. Install Data Catalog. See Deployment patterns to choose the deployment type to use, then proceed to Installation on Kubernetes.
  3. Restore the Data Catalog components:

Restore the keystore

You need to restore the keystore before you install Data Catalog.

NoteIf you do not restore the keystore before installing Data Catalog, the LDC Agent won't be able to connect to the LDC Application Server, and the LDC Application Server will have the status Disconnected.

Use the following steps to restore the keystore:

Procedure

  1. Run the kubectl apply command using the keystore-secret.yaml that was created during backup:

    kubectl apply -f keystore-secret.yaml -n <namespace>

    Where <namespace> is the namespace used during the Helm install for Data Catalog.

    NoteMake sure the keystore secret is named properly or its name is specified in the custom values file.
  2. (Optional) If this command fails, check that the following fields have been removed or commented out of the YAML file:

    • creationTimestamp
    • resourceVersion
    • selfLink
    • uid

Restore MongoDB

Restore MongoDB data by running the mongorestore command, passing the path to the dumps and the URI of the MongoDB instance.

NoteThe command below is an example only. You need to customize the mongorestore command for your environment.

Procedure

  1. Determine the following information per your Data Catalog installation:

    • MongoDB username
    • MongoDB password
    • Full URI to your MongoDB instance
  2. Use mongorestore command to import the binaries of the ldcdb database, as shown in the following example:

    mongorestore /tmp/dumps/ --uri="mongodb://<username>:<password>@<DNS or IP of server>:<mongoDB port>/" --drop --preserveUUID

    In this example, a restore for the ldcdb database is executed using dumps stored in the path /tmp/dumps:

Restoring large properties

If the files in your original large properties location have been lost, you can change your agent configuration to use the backed up large properties files.

See Large properties for more information.

Restore Keycloak

Depending on the version and setup of Keycloak that you are using, the script used to import data may differ. In the steps below, the kc.sh script is used. See https://www.keycloak.org/server/importExport for more information.

For Keycloak setups that leverage the standalone.sh script, see this document: https://support.kublr.com/support/solutions/articles/33000261942-kcp-keycloak-data-backup-and-restore.

In this example, backed-up files stored in keycloak-export-local are used to restore the Keycloak pod that is included in the Data Catalog Helm chart.

NoteThis procedure shows an example only. You should customize the commands for your environment.

Procedure

  1. Get the name of the Keycloak pod:

    kubectl get pod -n <namespace>
  2. Create a new folder /tmp/keycloak-import on the Keycloak pod:

    kubectl exec -n <namespace> <keycloak pod> -c keycloak -- sh -c 'rm -rf /tmp/keycloak-import && mkdir -p /tmp/keycloak-import'
  3. Copy the backed-up files into the /tmp/keycloak-import folder:

    kubectl cp -c keycloak ~/keycloak-export-local <namespace>/<keycloak pod>:/tmp/keycloak-import

    You can use the cat and tee commands to copy each file onto your Keycloak pod. For example:

    cat ldc-realm-realm.json | kubectl exec -i -n <namespace> <keycloak pod> -- tee /tmp/keycloak-import/ldc-realm-realm.json > /dev/null

  4. Run the kc.sh command, passing the /tmp/keycloak-import folder as an argument:

    kubectl exec -n <namespace> <keycloak pod> -c keycloak -it -- /opt/keycloak/bin/kc.sh import --dir /tmp/keycloak-importUpon successful import, the script will produce the following logs:
    [org.keycloak.exportimport.dir.DirImportProvider] (main) Importing from directory /tmp/keycloak-import
    [org.keycloak.services] (main) KC-SERVICES0030: Full model import requested. Strategy: OVERWRITE_EXISTING
    [org.keycloak.exportimport.util.ImportUtils] (main) Realm 'master' already exists. Removing it before import
    [org.keycloak.exportimport.util.ImportUtils] (main) Realm 'master' imported
    [org.keycloak.exportimport.dir.DirImportProvider] (main) Imported users from /tmp/keycloak-import/master-users-0.json
    [org.keycloak.exportimport.util.ImportUtils] (main) Realm 'ldc-realm' already exists. Removing it before import
    [org.keycloak.exportimport.util.ImportUtils] (main) Realm 'ldc-realm' imported
    [org.keycloak.exportimport.dir.DirImportProvider] (main) Imported users from /tmp/keycloak-import/ldc-realm-users-0.json
    [org.keycloak.services] (main) KC-SERVICES0032: Import finished successfully
  5. Log in to your Keycloak instance with admin credentials and confirm that all your realm data has been populated. If you have made any custom client roles (in the included Keycloak pod, this will be roles under ldc-client), make sure that you have run the mongorestore command to restore the user roles in MongoDB.