Restoring Data Catalog
This article includes procedures for restoring Lumada Data Catalog’s Solr™ data and indexes, the Postgres database, the keystore, and optionally, discovery metadata and UI logs.
Restore process overview
The process to restore Data Catalog includes several steps:. As a best practice, perform these steps in the following order:
Before you begin
Procedure
Make sure there are no Data Catalog jobs running.
As the Data Catalog service user, use the following commands to shut down the Data Catalog web service.
$ sudo su ldcuser $ /opt/ldc/app-server/bin/app-server stop $ /opt/ldc/agent/bin/agent stop $ /opt/ldc/metadata-server/bin/metadata-server stop
Restore the Solr collection
These instructions use the Solr APIs to restore the backup. The APIs only work with CDH versions 5.9 and later.
After you verify that no Data Catalog jobs are running and that the Data Catalog web service is shut down, you can restore the Solr collection.
The steps required for restoring backups depend on your Solr version, which typically depends on the Hadoop distribution you are running. Perform the tasks below that are applicable to your environment:
- Restore the Solr collection for CDH and Solr 4.10
- Restore Solr data on local and HDFS storage
- Restore the Solr collection for HDP and EMR
Restore the Solr collection for CDH and Solr 4.10
The steps required for restoring backups depend on your Solr version, which typically depends on the Hadoop distribution you are running. Perform the following steps for CDH and Solr 4.10:
Procedure
Remove the old Data Catalog Solr collection.
Your values may be different for the ZooKeeper ensemble string including port number, the Data Catalog collection name and the Data Catalog configuration name and the location of the Solr collection on HDFS.The following are sample commands:$ solrctl --zk localhost:2181/solr collection --delete wdcollection $ sudo -u hdfs hadoop fs -rm -r /solr/wdcollection
Restore the Data Catalog collection configuration in ZooKeeper.
The backup saves the required schema as managed-schema. To restore the configuration, copy the managed-schema file into the schema.xml file. Your values may be different for the ZooKeeper port number, the Data Catalog collection name, the backup file name, and the location of the backup.$ cd /tmp/wd-backups/backup_zk_config/conf $ mv schema.xml.bak schema.xml.bak2 $ mv managed-schema schema.xml $ solrctl --zk localhost:2181/solr instancedir --update wdconfig /tmp/wd-backups/backup_zk_config
Recreate the Data Catalog Solr collection with the same shard count and replication factor as the one that was backed up.
Your values may be different for the ZooKeeper ensemble string including port number, the Data Catalog collection name and the Data Catalog configuration name and the location of the Solr collection on HDFS.$ solrctl --zk localhost:2181/solr collection --create wdcollection -c wdconfig -s 1 -r 2 -m 2
As the Data Catalog service user, rebuild the schema of the collection.
$ sudo su ldcuser $ /opt/ldc/bin/ldc schemaAdmin -create true
Restart Solr through Cloudera Manager.
Restore the Solr collection index files.
Next steps
Restore Solr data on local and HDFS storage
Perform the following steps to restore the Solr data:
Procedure
You need to run the applicable command for your environment. You can adjust the following sample commands for your situation:
One replica and one shard
$ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
Two replicas and two shards
$ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups' $ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica2/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups' $ curl 'http://solrnode1:8983/solr/wdcollection_shard2_replica1/replication?command=restore&name=wdcollection_shard2_replica1_backup&location=/tmp/wd-backups' $ curl 'http://solrnode1:8983/solr/wdcollection_shard2_replica2/replication?command=restore&name=wdcollection_shard2_replica1_backup&location=/tmp/wd-backups'
(Optional) Restore the Data Catalog discovery metadata.
If the required directory is not already created, then you need permission to create the directory. You can verify the location of the discovery metadata in the ldc/conf/configuration.json file in the value of the ldc.metadata.hdfs.large_properties.path property.$ sudo -u hdfs hadoop fs -mkdir /user/ldcuser/.ldc_hdfs_metadata $ sudo -u hdfs hadoop fs -put /PATH/TO/BACKUP/LOCATION/backup_discovery_metadata/backup_discovery_metadata/* /user/ldcuser/.ldc_hdfs_metadata
Restart Solr through the Cloudera Manager.
Restore the Solr collection for HDP and EMR
Perform the following steps to restore the Solr collection for HDP and EMR:
Procedure
Remove the old Solr collection. As a Solr user, delete the collection.
Typically, the Solr location is /opt/lucidworks-hdpsearch/solr. Use the correct Solr port number and Data Catalog collection name.$ sudo su solr $ cd /opt/lucidworks-hdpsearch/solr $ bin/solr delete -c wdcollection -p 8983 $ sudo -u hdfs hadoop dfs -rm -r /user/solr/wdcollection $ exit
Restore the Data Catalog collection configuration in ZooKeeper.
$ /opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost zkhost:zkport -cmd upconfig -confname wdconfig -confdir /tmp/wd-backups/backup_zk_config
Recreate the old Solr collection.
As the Solr user, recreate the collection using the command you used for the original installation.
$ bin/solr create -c wdcollection -p 8983 -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/wdconfig -n wdconfig -s 1 -rf 2 $ exit
As the Data Catalog service user, rebuild the schema of the collection.
$ sudo su ldcuser $ /opt/ldc/agent/bin/agent schemaAdmin -create true
Restore the Data Catalog Solr collection from the backup.
Choose one of the following methods to restore the collection. Your choice depends on how many Solr instances were running when the collection was created. Run a command for each replica that needs to be updated.One Solr instance was running when the collection was originally created
From the machine where the Solr instance is installed, run the following commands (one for each replica) to restore the Solr data and indexes from the backup data. Your values may be different for the Solr port number, the Data Catalog collection name, the backup file name, and the location of the backup. You must enter the commands as single lines:
$ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
$ curl 'http://solrnode2:8983/solr/wdcollection_shard1_replica2/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
NoteAll replicas in all nodes need to be restored in the EMR environment.Two (or more) Solr instances were running when the collection was originally created
From the machine where each Solr instance is installed, run the following command to restore the Solr data and indexes from the backup data. Your values may be different for the Solr port number, the Data Catalog collection name, the backup file name, and the location of the backup. You must enter the command as a single line:
$ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
Repeat this command for each Solr instance. For example, in the second version, replace the Solr port with the port number for the second instance, such as
8984
.$ curl 'http://solrnode2:8984/solr/wdcollection_shard1_replica2/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups'
Multiple shards
If you have multiple shards, you must run the following commands for each shard and replica. For example, for a two-shard, two-replica collection, the commands are as follows:
$ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica1/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups' $ curl 'http://solrnode1:8983/solr/wdcollection_shard1_replica2/replication?command=restore&name=wdcollection_shard1_replica1_backup&location=/tmp/wd-backups' $ curl 'http://solrnode1:8983/solr/wdcollection_shard2_replica1/replication?command=restore&name=wdcollection_shard2_replica1_backup&location=/tmp/wd-backups' $ curl 'http://solrnode1:8983/solr/wdcollection_shard2_replica2/replication?command=restore&name=wdcollection_shard2_replica1_backup&location=/tmp/wd-backups'
(Optional) Restore the Data Catalog discovery metadata.
You can verify the location of the discovery metadata in the ldc/conf/configuration.json file in the value of the ldc.metadata.hdfs.large_properties.path property.$ sudo -u hdfs hadoop fs -mkdir /user/ldcuser/.ldc_hdfs_metadata $ sudo -u ldcuser hadoop fs -put /PATH/TO/HDFS/BACKUP/LOCATION/backup_discovery_metadata</span>/* /user/ldcuser/.ldc_hdfs_metadata
Restart Solr.
Restore the keystore
Procedure
Copy the keystore to the LDC Application Server:
<App-Server Dir>$ cp <back-up location>/keystore jetty-distribution-9.4.18.v20190429/ldc-base/etc/keystore
Copy the keystore to the LDC Metadata Server:
<Meta_Server Dir>$ cp <back-up location>/keystore conf/keystore
Copy the keystore to the LDC Agent:
<Agent Dir>$ cp <back-up location>/keystore conf/keystore
Restore Postgres
Procedure
Log in to the node where Postgres is installed.
$ ssh <ssh-user>@<postgres-host>
Navigate to the Postgres backup directory, typically /data/postgres_backups.
Restore using the pg_restore command. Verify that the BAK file is fully decompressed. Example:
$ pgsql/bin/pg_restore -U ldc -f 20190929_postgres.bak -d ldc_db
Start Data Catalog services
Procedure
Start the LDC Application Server:
$ /opt/ldc/app-server/bin/app-server start
Start the LDC Agent:
$ /opt/ldc/agent/bin/agent start
Start the LDC Metadata Server:
$ /opt/ldc/metadata-server/bin/metadata-server start