Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Restoring Data Catalog

Parent article

This article includes procedures for restoring Lumada Data Catalog’s Solr™ data and indexes, the Postgres database, the keystore, and optionally, discovery metadata and UI logs.

NoteThese instructions are intended to restore standalone instances of Data Catalog after a backup. If you need to restore Data Catalog as a part of your Lumada DataOps Suite installation, see the Administer section in the LDOS documentation.

Before you begin

Before you restore a Data Catalog backup, you must perform the following steps:

Procedure

  1. Make sure there are no Data Catalog jobs running.

  2. As the Data Catalog service user, use the following commands to shut down the Data Catalog web service.

    $ sudo su ldcuser
    $ /opt/ldc/app-server/bin/app-server stop
    $ /opt/ldc/agent/bin/agent stop
    $ /opt/ldc/metadata-server/bin/metadata-server stop

Restore the Solr collection

These instructions use the Solr APIs to restore the backup. The APIs only work with CDH versions 5.9 and later.

After you verify that no Data Catalog jobs are running and that the Data Catalog web service is shut down, you can restore the Solr collection.

NoteWhen running in a Kerberized environment, make sure to run these commands as the Solr service user.

The steps required for restoring backups depend on your Solr version, which typically depends on the Hadoop distribution you are running. Perform the tasks below that are applicable to your environment:

  1. Restore the Solr collection for CDH and Solr 4.10
  2. Restore Solr data on local and HDFS storage
  3. Restore the Solr collection for HDP and EMR

Restore the Solr collection for CDH and Solr 4.10

These instructions use the Solr APIs to restore the backup. The APIs only work with CDH versions 5.9 and later.

The steps required for restoring backups depend on your Solr version, which typically depends on the Hadoop distribution you are running. Perform the following steps for CDH and Solr 4.10:

Procedure

  1. Remove the old Data Catalog Solr collection.

    Your values may be different for the ZooKeeper ensemble string including port number, the Data Catalog collection name and the Data Catalog configuration name and the location of the Solr collection on HDFS.The following are sample commands:
    $ solrctl --zk localhost:2181/solr collection --delete wdcollection
    $ sudo -u hdfs hadoop fs -rm -r /solr/wdcollection
  2. Restore the Data Catalog collection configuration in ZooKeeper.

    The backup saves the required schema as managed-schema. To restore the configuration, copy the managed-schema file into the schema.xml file. Your values may be different for the ZooKeeper port number, the Data Catalog collection name, the backup file name, and the location of the backup.
    $ cd /tmp/wd-backups/backup_zk_config/conf
    $ mv schema.xml.bak schema.xml.bak2
    $ mv managed-schema schema.xml
    $ solrctl --zk localhost:2181/solr instancedir --update wdconfig /tmp/wd-backups/backup_zk_config
  3. Recreate the Data Catalog Solr collection with the same shard count and replication factor as the one that was backed up.

    Your values may be different for the ZooKeeper ensemble string including port number, the Data Catalog collection name and the Data Catalog configuration name and the location of the Solr collection on HDFS.
    $ solrctl --zk localhost:2181/solr collection --create wdcollection -c wdconfig  -s 1 -r 2 -m 2
  4. As the Data Catalog service user, rebuild the schema of the collection.

    $ sudo su ldcuser
    $ /opt/ldc/bin/ldc schemaAdmin -create true
  5. Restart Solr through Cloudera Manager.

  6. Restore the Solr collection index files.

Next steps

Restore Solr data on local and HDFS storage

After restoring the Solr collection on CDH or Solr 4.10, you need to restore the Solr collection data on the local file system and HDFS.
NoteWhen running in a Kerberized environment, make sure to run these commands as the Solr service user.

Perform the following steps to restore the Solr data:

Procedure

  1. You need to run the applicable command for your environment. You can adjust the following sample commands for your situation:

    • One replica and one shard

      $ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'

    • Two replicas and two shards

      $ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
      $ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica2/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
      $ curl 'http://solrnode1:8983/solr/wdcollection_shard2_replica1/replication?command=restore&name=wdcollection_shard2_replica1_backup&location=/tmp/wd-backups'
      $ curl 'http://solrnode1:8983/solr/wdcollection_shard2_replica2/replication?command=restore&name=wdcollection_shard2_replica1_backup&location=/tmp/wd-backups'
  2. (Optional) Restore the Data Catalog discovery metadata.

    If the required directory is not already created, then you need permission to create the directory. You can verify the location of the discovery metadata in the ldc/conf/configuration.json file in the value of the ldc.metadata.hdfs.large_properties.path property.
    $ sudo -u hdfs hadoop fs -mkdir /user/ldcuser/.ldc_hdfs_metadata
    $ sudo -u hdfs hadoop fs -put /PATH/TO/BACKUP/LOCATION/backup_discovery_metadata/backup_discovery_metadata/* 
    /user/ldcuser/.ldc_hdfs_metadata
  3. Restart Solr through the Cloudera Manager.

Restore the Solr collection for HDP and EMR

This task can be used for Lucidworks Solr and Apache Solr 5.5.4.
NoteWhen running in a Kerberized environment, make sure to run these commands as the Solr service user.

Perform the following steps to restore the Solr collection for HDP and EMR:

Procedure

  1. Remove the old Solr collection. As a Solr user, delete the collection.

    Typically, the Solr location is /opt/lucidworks-hdpsearch/solr. Use the correct Solr port number and Data Catalog collection name.
    $ sudo su solr
    $ cd /opt/lucidworks-hdpsearch/solr
    $ bin/solr delete -c wdcollection -p 8983
    $ sudo -u hdfs hadoop dfs -rm -r /user/solr/wdcollection
    $ exit
  2. Restore the Data Catalog collection configuration in ZooKeeper.

    $ /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost zkhost:zkport -cmd upconfig -confname wdconfig -confdir /tmp/wd-backups/backup_zk_config
  3. Recreate the old Solr collection.

    1. As the Solr user, recreate the collection using the command you used for the original installation.

      $ bin/solr create -c wdcollection -p 8983 -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/wdconfig  -n wdconfig -s 1 -rf 2
      $ exit
    2. As the Data Catalog service user, rebuild the schema of the collection.

      $ sudo su ldcuser
      $ /opt/ldc/agent/bin/agent schemaAdmin -create true
  4. Restore the Data Catalog Solr collection from the backup.

    Choose one of the following methods to restore the collection. Your choice depends on how many Solr instances were running when the collection was created. Run a command for each replica that needs to be updated.
    • One Solr instance was running when the collection was originally created

      From the machine where the Solr instance is installed, run the following commands (one for each replica) to restore the Solr data and indexes from the backup data. Your values may be different for the Solr port number, the Data Catalog collection name, the backup file name, and the location of the backup. You must enter the commands as single lines:

      $ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
      

      $ curl 'http://solrnode2:8983/solr/wdcollection_shard1_replica2/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
      NoteAll replicas in all nodes need to be restored in the EMR environment.
    • Two (or more) Solr instances were running when the collection was originally created

      From the machine where each Solr instance is installed, run the following command to restore the Solr data and indexes from the backup data. Your values may be different for the Solr port number, the Data Catalog collection name, the backup file name, and the location of the backup. You must enter the command as a single line:

      $ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
      

      Repeat this command for each Solr instance. For example, in the second version, replace the Solr port with the port number for the second instance, such as 8984.

      $ curl 'http://solrnode2:8984/solr/wdcollection_shard1_replica2/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
    • Multiple shards

      If you have multiple shards, you must run the following commands for each shard and replica. For example, for a two-shard, two-replica collection, the commands are as follows:

      $ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'  
      $ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica2/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'  
      $ curl 'http://solrnode1:8983/solr/wdcollection_shard2_replica1/replication?command=restore&name=wdcollection_shard2_replica1_backup&location=/tmp/wd-backups'  
      $ curl 'http://solrnode1:8983/solr/wdcollection_shard2_replica2/replication?command=restore&name=wdcollection_shard2_replica1_backup&location=/tmp/wd-backups'
  5. (Optional) Restore the Data Catalog discovery metadata.

    You can verify the location of the discovery metadata in the ldc/conf/configuration.json file in the value of the ldc.metadata.hdfs.large_properties.path property.
    $ sudo -u hdfs hadoop fs -mkdir /user/ldcuser/.ldc_hdfs_metadata  
    $ sudo -u ldcuser hadoop fs -put /PATH/TO/HDFS/BACKUP/LOCATION/backup_discovery_metadata</span>/* /user/ldcuser/.ldc_hdfs_metadata
  6. Restart Solr.

Restore the keystore

Use the following steps to restore the Data Catalog keystore:
NoteKeystore restoration only applies to Lumada Data Catalog versions 2019.3 and later.

Procedure

  1. Copy the keystore to the LDC Application Server:

    <App-Server Dir>$ cp <back-up location>/keystore jetty-distribution-9.4.18.v20190429/ldc-base/etc/keystore
  2. Copy the keystore to the LDC Metadata Server:

    <Meta_Server Dir>$ cp <back-up location>/keystore conf/keystore
  3. Copy the keystore to the LDC Agent:

    <Agent Dir>$ cp <back-up location>/keystore conf/keystore

Restore Postgres

Use the following steps to restore Postgres for Data Catalog:

Procedure

  1. Log in to the node where Postgres is installed.

    $ ssh <ssh-user>@<postgres-host>
  2. Navigate to the Postgres backup directory, typically /data/postgres_backups.

  3. Restore using the pg_restore command. Verify that the BAK file is fully decompressed. Example:

    $ pgsql/bin/pg_restore -U ldc -f 20190929_postgres.bak -d ldc_db

Start Data Catalog services

Start the Data Catalog services in the following order:

Procedure

  1. Start the LDC Application Server:

    $ /opt/ldc/app-server/bin/app-server start
  2. Start the LDC Agent:

    $ /opt/ldc/agent/bin/agent start
  3. Start the LDC Metadata Server:

    $ /opt/ldc/metadata-server/bin/metadata-server start