Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Upgrading to Lumada Data Catalog 6.x

Parent article

Lumada Data Catalog 6.x contains a change in repository storage from previous Data Catalog versions. In 5.x versions, most of the metadata was stored in Solr. In 6.x, all the metadata is stored in Postgres and only the search index is stored in Solr. In version 6.0.x, the Postgres database is the primary storage for the Data Catalog repository, including audit logs and transactional data.

Follow the sequence below to upgrade to Lumada Data Catalog 6.x:

  1. Back up existing data and configurations from Solr and Data Catalog
  2. Perform pre-upgrade checks
  3. Upgrade the LDC Application Server
  4. Upgrade the LDC Metadata Server
  5. Upgrade the LDC Agent
  6. Perform post-upgrade steps
  7. Download and install Postgres if needed
  8. Download and install Solr if needed
  9. Verify the Data Catalog upgrade

Back up Data Catalog

As a best practice, you should back up your current Lumada Data Catalog instance and related databases.

Perform the following steps to back up your instance of Data Catalog before upgrading:

Procedure

  1. Follow the procedure documented in Backing up Data Catalog.

  2. If upgrading from previous Data Catalog (Waterline Data) versions to the current Lumada Data Catalog release, back up your Solr collections.

  3. If upgrading from Data Catalog 2019.1, back up the Postgres database.

    In addition to backing up the Solr collection, backing up the Postgres database is required.
  4. Back up the <LDC-HOME> directory (for example, /opt/ldc), which contains the configuration.json, install.properties, and other group mapping configurations for restoring after upgrade.

  5. Back up the discovery cache, which consists of all the files in the directory set in the ldc.metadata.hdfs.large_properties.path property. The default is /user/ldcsvc/.ldc_hdfs_metadata.

Perform pre-upgrade checks

Perform the following steps to set up your environment before you begin the upgrade process:

Procedure

  1. Verify Java is in the following paths:

    • $ export JAVA_HOME=/usr/java/jdk1.8.0_144
    • $ export PATH=${JAVA_HOME}/bin:${PATH}
  2. Verify psql is in the following path:

    $ export PATH=/usr/local/pgsql/bin:${PATH}
  3. Run the Validator utility on the following tags and maps:

    • checkfor-invalid-tag-associations
    • checkfor-orphan-tag-associations
    • checkfor-invalid-resource-folder-maps
    • checkfor-orphan-resource-folder-maps
    • checkfor-duplicate-resource-folder-maps
    • checkfor-duplicate-case-sensitive-tags

Next steps

The upgrade script does not support ZSH shell. If ZSH is your default shell, use CHSH to set the default shell to BASH.

Upgrading the Lumada Data Catalog Package

If needed, you can restart the upgrade process. If you encounter an issue when upgrading from 2019.x, you can resolve the issue and start the process again. The upgrade process automatically checks for completed steps and resumes from the point of last failure, executing only the unfinished steps.

NoteAlways run the installer files as a sudo user.

Caution You must upgrade the Data Catalog components in the following order. Otherwise, the repository may become corrupted.
  1. LDC Application Server
  2. LDC Metadata Server
  3. LDC Agent

Upgrade the LDC Application Server

Before you begin

You must upgrade the Data Catalog components in a specified order to avoid errors.
Perform the following steps to upgrade the LDC Application Server:

Procedure

  1. Use the following command to stop the LDC Application Server (2019.x):

    $ <APP-SERVER-HOME>/bin/app-server stop
  2. Run the LDC Application Server installer by entering the following command:

    ldc-app-server-6.0.1.run
  3. Enter 2 to select Upgrade and follow the on-screen instructions. Enter y at the Upgrade data in repository now? prompt.

    The following verification screen appears:
    $ sudo ./ldc-app-server-6.0.1.run
    
        Verifying archive integrity...  100%   All good.
        Uncompressing Lumada Data Catalog App Server Installer  100%
    
    
        This program installs Lumada Data Catalog Application Server.
    
        Press ^C at any time to quit.
    
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                       LUMADA DATA CATALOG APPLICATION SERVER INSTALLER
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            1. Express Install          (Requires superuser access)
            2. Upgrade
            3. Exit
    
        Enter your choice [1-3]: 2
        Enter the location of existing Lumada Data Catalog Application Server installation [/opt/ldc]:
        Found version 6.0.0 in /opt/ldc/app-server/conf/install.properties.
        Enter the Solr server version [8.4.1]: 7.4
        The data in the metadata repository needs to be upgraded to the latest version.
        This can be done as part of the current software upgrade process, however if you have a lot of data,
        it is recommended to run it later separately after the software upgrade as it may take a long time.
        Upgrade data in repository now? [y/N]: y
        Proceed? [Y/n]:
        Stopping services ... done.
        Backing up existing data ... done.
                Existing installation backed up to /tmp/ldc-app-server-backup-tmp-ZKnz
        Copying files ... done.
        Restored License.
        Updating configuration LP and SOLR properties... done.
        Updating configuration ... done.
        Creating secret key ... done.
        ReEncrypting passwords ... done.
        Upgrading postgres repository schema and data (this may take a long time) ... done.
        Upgrading solr repository schema and data (this may take a long time) ... done.
        Starting services ........................ done.
    
        Lumada Data Catalog APPLICATION SERVER upgraded successfully!
  4. Upgrade the Postgres and Solr repositories.

    If you did not upgrade the repository as part of the upgrade process above (for example, if you entered n at the Upgrade data in repository now? prompt),then run the following scripts to upgrade the Postgres and Solr repositories manually.
    • Upgrade Postgres repository

      <APP-SERVER-HOME> $ bin/repo_upgrade.sh postgres true >& /var/log/ldc/pgupgrade.log

    • Upgrade Solr Repository

      <APP-SERVER-HOME> $ /bin/repo_upgrade.sh solr true >& /var/log/ldc/solrupgrade.log

  5. Start the LDC Application Server by entering the following command:

    <APP-SERVER-HOME> $ /bin/app-server start

Results

The LDC Application Server is upgraded to version 6.0.1.

Next steps

After upgrading the LDC Application Server, proceed to Upgrade the LDC Metadata Server and then Upgrade the LDC Agent.

Upgrade the LDC Metadata Server

Before you begin

You must upgrade the Data Catalog components in a specified order to avoid errors. You should have already upgraded the LDC Application Server. Start the LDC Application Server before you upgrade the LDC Metadata Server.

Perform the following steps to upgrade the LDC Metadata Server.

Procedure

  1. Navigate to Manage, then click Tokens.

  2. Copy the token for the LDC Metadata Server to communicate with the LDC Application Server, then use CLI to run the command to register the LDC Metadata Server. For example:

    $ sudo ./ldc-metadata-server-6.0.1.run -- --init --endpoint http://hdp.ldc.com:8082 \
                                                     --client-id metadata-rest-server \
                                                     --token d4786f62-8785-48d4-9759-c69cf191ca9c 
                                                     --public-host hdp.ldc.com \
                                                     --port 4242 
                                                     --public-port 4242
  3. Make sure that the endpoint specified in the LDC Metadata Server installation command can connect to the LDC Application Server.

  4. Follow the prompts, setting the applicable parameters for your environment. Enter 2 for Upgrade.

    A sample LDC Metadata Server upgrade should resemble the following screen:
    $ sudo ./ldc-metadata-server-6.0.1.run -- --init --endpoint http://hdp.ldc.com:8082 \
                                                     --client-id metadata-rest-server \
                                                     --token d4786f62-8785-48d4-9759-c69cf191ca9c 
                                                     --public-host hdp.ldc.com \
                                                     --port 4242 
                                                     --public-port 4242
    Verifying archive integrity...  100%   All good.
    Uncompressing Lumada Data Catalog Metadata Server Installer  100%
    
    
    This program installs Lumada Data Catalog Metadata Server.
    
    Press ^C at any time to quit.
    
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           LUMADA DATA CATALOG METADATA SERVER INSTALLER
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        1. Express Install          (Requires superuser access)
        2. Upgrade
        3. Exit
    
    Enter your choice [1-3]: 2
    Enter the location of existing Lumada Data Catalog Metadata Server installation [/opt/ldc]:
    Found version 6.0.0 in /opt/ldc/metadata-server/conf/install.properties.
    Enter the Solr server version [8.4.1]: 7.5
    Proceed? [Y/n]:
    Stopping services ... done.
    Backing up existing data ... done.
            Existing installation backed up to /tmp/ldc-metadata-server-backup-tmp-Ky1Y
    Copying files ... done.
    Updating configuration ... done.
    Replacing keystore file ...
    Updating SOLR version in configuration... done.
    Executing command: "/opt/ldc/metadata-server/bin/metadata-server" upgrade --old-yaml /tmp/ldc-metadata-server-backup-tmp-Ky1Y/conf/application.yml --no-exec false
    Upgrading application
    done.
    
    Lumada Data Catalog METADATA SERVER upgraded successfully!
  5. Verify the following. If the SSL property in the <LDC Meta-Server>conf/application.yaml property in the previous version was set to true, then change this property to true in the upgraded <LDC Meta-Server>conf/application.yaml property also.

  6. Restart the LDC Metadata Server.

Next steps

After upgrading the LDC Application Server and the LDC Metadata Server, proceed to Upgrade the LDC Agent.

Upgrade the LDC Agent

Before you begin

You must upgrade the Data Catalog components in a specified order to avoid errors. You should have already upgraded the LDC Application Server and the LDC Metadata Server.
Perform the following steps to upgrade the LDC Agent:

Procedure

  1. Make sure that the endpoint specified in the LDC Metadata Server installation command has connectivity with the LDC Application Server.

    You can run commands such as ping hdp.ldc.com to verify that you can communicate with the LDC Application Server from the LDC Agent.
  2. Use the following command to register the LDC Agent:

    $ ./ldc-agent-6.0.1.run -- --register --endpoint http://hdp.ldc.com:8082 
                                          --agent-id ra61abef1a01574d6f 
                                          --agent-token b9a8b8c7-e51f-43b4-9de9-0d2bffa45259
  3. Follow the prompts, setting the applicable parameters for your environment. Enter 2 for Upgrade.

    $ sudo ./ldc-agent-6.0.1.run -- --register --endpoint http://hdp.ldc.com:8082 
                                               --agent-id ra61abef1a01574d6f 
                                               --agent-token b9a8b8c7-e51f-43b4-9de9-0d2bffa45259
    Verifying archive integrity...  100%   All good.
    Uncompressing Lumada Data Catalog Agent Installer  100%
    
    
    This program installs Lumada Data Catalog Agent.
    
    Press ^C at any time to quit.
    
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           LUMADA DATA CATALOG AGENT INSTALLER
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        1. Express Install          (Requires superuser access)
        2. Upgrade
        3. Exit
    
    Enter your choice [1-3]: 2
    Enter the location of existing Lumada Data Catalog Agent installation [/opt/ldc]:
    Found version 6.0.0 in /opt/ldc/agent/conf/install.properties.
    Enter HIVE version [3.1.2]:
    Proceed? [Y/n]:
    Stopping services ... done.
    Backing up existing data ... done.
            Existing installation backed up to /tmp/ldc-agent-backup-tmp-yfp1
    Copying files ... done.
    Updating HIVE version in configuration... done.
    Updating configuration ... done.
    Updating Kerberos configuration ...done.
    Replacing keystore file ...
    Restored License.
    Executing command: "/opt/ldc/agent/bin/agent" upgrade --old-yaml /tmp/ldc-agent-backup-tmp-yfp1/conf/application.yml --no-exec false
    Upgrading agent
    done.
    removed ‘/tmp/tmp.nZ2e3c5gHW’
    
    Lumada Data Catalog AGENT upgraded successfully!
  4. Verify that the LDC Agent is connected and registered.

    When the Agent is connected and registered, the status icons are green.

Results

The LDC Agent is upgraded successfully.

Next steps

If you did not select to upgrade the repositories when upgrading the LDC Application Server, then run the following scripts to upgrade the Postgres and Solr repositories manually.
  • Upgrade Postgres repository

    <APP-SERVER-HOME> $ bin/repo_upgrade.sh postgres true >& /var/log/ldc/pgupgrade.log

  • Upgrade Solr Repository

    <APP-SERVER-HOME> $ /bin/repo_upgrade.sh solr true >& /var/log/ldc/solrupgrade.log

For more information about how to upgrade the repositories when upgrading the LDC Application Server, see Upgrade the LDC Application Server.

Perform post-upgrade steps

Perform the following steps after you upgrade the LDC Application Server, the LDC Metadata Server, and the LDC Agent:

Procedure

  1. Copy all the JDBC drivers to the ext/ directory of the LDC Agent and the LDC Application Server.

  2. Copy the hive-serde-1.0.1.jar and ldc-hive-formats-6.0.1.jar JAR files to a location on HDFS or S3.

    These JAR files are packaged under the ext/ folder of the LDC Agent and the LDC Application Server.
  3. Update the ldc.profile.customSerde.url property value in the LDC Agent's conf/configuration.json file with the HDFS/S3 location in the previous step:

    "hdfs://namenode:8020/user/ldcuser/ldcserde"

  4. (Optional) If your previous installation had job-locking enabled, make sure you reset the following configuration properties in the <Agent Dir>/conf/configuration.json file:

    • ldc.job.locking.enabled

      Set the value parameter to true.

    • ldc.job.locking.zookeeper.znode

      Set the value parameter to the zknode. For example, ip-172-31-29-206.9983.

    • ldc.metadata.solr.usezk

      Set the value parameter to true.

  5. (Optional) If using OIDC for Data Catalog login authorization, you need to manually update the properties file under app-server/conf/oidc-client-web.properties to point to the correct URI.

    For example: rp.secret_properties_uri = file:///opt/ldc/app-server/conf/secret-oidc-client.properties.
    NoteYou should update the URI path to include the LDC Application Server path.
  6. (Optional) If connecting to S3 data sources, make sure you have copied additional JAR files as specified in Final EMR setup.

  7. (Optional) If upgrading on the Azure platform, make sure you have copied the relevant JAR files as specified in Installing Lumada Data Catalog on Azure HDInsight.

  8. Restart the LDC Application Server and the LDC Agent.

    Data Catalog automatically assigns the local agent that was created as part of the LDC Agent installation step to data sources.
  9. Verify LDC Agent assignment and connection success.

    1. Go to Manage then click Data Source.

    2. Click Test Connection for each data source name.

  10. See Managing roles to view RBAC changes and recreate roles in version 6.0.x to best fit your deployment requirements.

    Data Catalog version 6.0.1 has significant updates to the RBAC privilege model from previous Lumada Data Catalog and Waterline Data Catalog versions. The upgrade process creates fixed mappings for the existing roles in the previous versions.

Next steps

Download and install Postgres

Lumada Data Catalog 6.0.x uses Postgres database version 11.9 as its primary repository on the CentOS platform.

If your existing Postgres version is earlier than 11.9, perform the following steps to upgrade to Postgres version 11.9:

Procedure

  1. Back up your existing ldc_db (or waterlinedb) Postgres database.

  2. Download and install PostgreSQL.

  3. Verify the database is installed under the /usr/local/pgsql/ path and is only accessible by a Data Catalog service user.

  4. Use the following command to verify that you can connect to Postgres:

    $ psql -U ldc
  5. Drop the old database and create a new database. Example commands are:

    • $ psql -U ldc -c "DROP DATABASE ldc_db"
    • $ psql -U ldc -c "CREATE DATABASE ldc_db OWNER ldcuser"
  6. Restore the Postgres data you backed up earlier and verify that the data restoration is intact.

  7. Update the following Postgres details in the LDC Application Server and LDC Metadata Server configuration.json files found in <APP-SERVER-HOME>/conf/ and <METADATA-SERVER-HOME>/conf/ paths respectively.

    • ldc.metadata.postgres.url
    • ldc.metadata.db.postgres.username
    • ldc.metadata.db.postgres.password
  8. Use the following command to restart the LDC Application Server:

    <APP-SERVER-HOME>$ bin/app-server restart
  9. Use the following command to restart the LDC Metadata Server:

    <METADATA-SERVER-HOME>$ bin/metadata-server restart

Download and install Solr

Lumada Data Catalog versions 6.x.x use Solr version 8.4.1 on the CentOS platform.
  • If your existing Solr version is 8.4.1, go to Verify the Data Catalog upgrade.
  • If your existing Solr version is earlier than 8.4.1, perform the steps in this task.

If your existing Solr version is earlier than 8.4.1, perform the following steps to upgrade to Solr version 8.4.1:

Procedure

  1. Create a new Solr collection (for example, ldccollection_new) using an existing or new Apache ZooKeeper configuration (for example, ldcconfig_new).

  2. Update the following properties in the configuration.json files located in <APP-SERVER-HOME>/conf/ and <METADATA-SERVER-HOME>/conf/.

    • ldc.metadata.solrserver.url

      Update only if migrating to the new Solr server.

    • ldc.metadata.solr.server.version

      Update only if migrating to the new Solr server.

    • ldc.metadata.solr.usezk

    • ldc.metadata.solr.zk.connection.string

    • ldc.metadata.solr.authType

    • ldc.metadata.solr.collection

      Update the value to the new Solr collection (for example, ldcconfig_new) as created in the previous step.

  3. Create a schema on the new Solr collection, as shown in the following example command:

    <APP-SERVER-HOME> $ bin/repo_upgrade.sh solr false /var/log/ldc
  4. Restart theLDC Application Server, as shown in the following example command:

    <APP-SERVER-HOME>$ bin/app-server restart
  5. Restart the LDC Metadata Server, as shown in the following example command:

    <METADATA-SERVER-HOME>$ bin/metadata-server restart
  6. Run the Sync job using app-util to issue the Solr collection from the Postgres repository, as shown in the following example command:

    <APP-SERVER-HOME>$ bin/app-util syncIndex -syncMode full

Verify the Data Catalog upgrade

After you have upgraded Postgres and Solr to the recommended versions for Data Catalog, you need to verify that the upgrade was successful.

Perform the following steps to verify your upgrade to Lumada Data Catalog 6.1.0 was successful:

Procedure

  1. Check the counts of entities before and after upgrade and make sure that the counts are reasonable.

    For example, you may have more audit events after an upgrade.
  2. Log in as the service user and make sure all data sources have an agent attached.

  3. Run the Test Connection for all data sources to ensure successful connections.

  4. Verify the following:

    • You can browse in all the virtual folders.
    • All tag domains and tags are accessible.
    • You can perform searches as the service user and other users.
  5. Restart the LDC Application Server.

  6. Run simple format or schema jobs to ensure that all connectivity issues are resolved.

  7. Navigate to Dashboard Overview to check the Data Ops dashboard.

    NoteData Catalog upgrades any resource lineages as part of the upgrade process. However, Data Catalog also supports multi-hop lineage and field lineage. As a best practice, run a discovery job for these lineages after the upgrade is complete.
  8. Rerun the lineage discovery job to discover any multi-hop and field lineages.

Troubleshooting upgrade issues

Follow the suggestions in these topics to help resolve common issues with upgrading Data Catalog.

Hive schema job has a Kerberos error

After upgrading the Data Catalog, the Hive schema job fails with a Kerberos error. This error occurs when the Hive connection URL is using the pre-upgrade format. To resolve this issue, update the Hive connection URL by removing the kerberosAuthType property, as shown in the following before and after examples:

  • Pre-upgrade Hive connection URL:

    jdbc:hive2://hdp.ldc.com:10000/default;principal=hive/hdp.ldc.com@hitachivantara.com;auth=kerberos;kerberosAuthType=fromSubject

  • Post-upgrade Hive connection URL:

    jdbc:hive2://hdp.ldc.com:10000/default;principal=hive/hdp.ldc.com@hitachivantara.com;auth=kerberos

Agent fails to connect to the LDC Metadata Server

When the LDC Agent fails to connect to the LDC Metadata Server, the agent.log file contains the following exception error:

10 Apr 2020 20:46:47.890 [scheduling-1] ERROR com.hitachivantara.remoteagent.AgentRegistrationService - Error connecting to metadataserver
org.springframework.web.client.ResourceAccessException: I/O error on GET request for "https://hdp265.ldc.com:4242/api/v3/test-connection": Unrecognized SSL message, plaintext connection?; nested exception is javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?
        at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:743) ~[spring-web-5.1.4.RELEASE.jar:5.1.4.RELEASE]
        at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:669) ~[spring-web-5.1.4.RELEASE.jar:5.1.4.RELEASE]
        at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:578) ~[spring-web-5.1.4.RELEASE.jar:5.1.4.RELEASE]

The applicable parameters may not be set to the proper values. To resolve this issue, ensure the following configurations are set correctly:

  • LDC Application Server

    In the configuration.json file, set ldc.metadata.server.isSecure to true.

  • LDC Metadata Server

    In the <Metadata-server Dir>/conf/application.yaml file, set ssl enabled to true.

  • LDC Agent

    In the <Agent Dir>/conf/meta-client-configuration.json file, make sure the "serverBaseUrl" value is a secure HTTP URL in https:// format.

Restart the services in the following order:

<LDC-HOME> $ app-server/bin/app-server restart
<LDC-HOME> $ metadata-server/bin/metadata-server restart
<LDC-HOME> $ agent/bin/agent restart