Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Installing Lumada Data Catalog on MapR

Parent article

The following steps will help you install and configure Lumada Data Catalog in MapR environment.

Requirements

Data Catalog supports the following configuration on MapR:

  • MapR version 6.1 secured environment (running MapR FS and Spark 2.3.1; Hive 2.3.1)
  • Recommended Solr 7.5.0 running in SolrCloud mode with a single shard (local storage)
  • For memory requirements, see Minimum node requirements.

Sizing estimates

If you plan to install Data Catalog and Solr on the same node, the node should have at least 64 GB RAM. Alternatively, configure Data Catalog and Solr on separate nodes in the same cluster.

Preparation

Before proceeding to installing Data Catalog, make sure you have read and followed the Pre-installation Validations for your environment, specifically:

Configure the authentication method for MapR

While configuring the authentication for MapR, use the following steps.

Procedure

  1. Create ticket for Lumada Data Catalog service user with service impersonation properties.

    This ticket should have long validity. The following command will create a ticket file with name ldcuser_ticket which is valid for 365 days.
    WLD Service User Home$ maprlogin  generateticket  -type servicewithimpersonation \
                                                        -user ldcuser \
                                                        -out ldcuser_ticket \
                                                        -duration 365:0:0
  2. Write MapR credentials of user ldcuser for cluster maprdemo.waterlinedata.com to secure_ldcuser_ticket.

    WLD Service User Home$ ls secure_ldcuser_ticket
  3. Configure the MAPR_TICKETFILE_LOCATION variable.

    This will allow you to generate ticket.
  4. You can then set this in .bash_profile as follows:

    LDC Service User Home$ vi ~/.bash_profile
    
    # .bash_profile
    
    # Get the aliases and functions
    if [ -f ~/.bashrc ]; then
            . ~/.bashrc
    fi
    
    # User specific environment and startup programs
    
    PATH=$PATH:$HOME/.local/bin:$HOME/bin
    export MAPR_TICKETFILE_LOCATION=/home/ldcuser/secure_ldcuser_ticket
    
    export PATH
  5. Check if beeline is part of the classpath.

    1. Run the beeline command at the prompt as follows:

      WLD Service User Home$ beeline beeline: command not found

      If you get a command not found error, perform step 5b. Otherwise, proceed to step 6.

    2. As a root user perform the following to create a symlink or soft link for beeline for Data Catalog to run on MapR.

      This is one of the ways to add beeline to the classpath.
      WLD Service User Home$ ln -s /opt/mapr/hive/hive-2.3/bin/beeline /usr/bin/beeline
  6. Make sure the path variables for HIVE_HOME and SPARK_HOME are defined in bash_profile as follows:

    WLD Service User Home$ vi ~/.bash_profile
    
    # .bash_profile
    
    # Get the aliases and functions
    if [ -f ~/.bashrc ]; then
            . ~/.bashrc
    fi
    
    # User specific environment and startup programs
    
    export HIVE_HOME=/opt/mapr/hive/<hive-version>
    export SPARK_HOME=/opt/mapr/spark/<spark-version>
    
    PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HIVE_HOME/bin:$SPARK_HOME/bin
    export PATH

Download the Data Catalog packages

Download the Data Catalog distribution from the location provided by Data Catalog. If your organization has subscribed to support, you can find the location through the Data Catalog Support portal at Hitachi Vantara Lumada and Pentaho Support Portal.

You should obtain access to three installers:

  • wld-app-server-X.run
  • wld-metadata-server-X.run
  • wld-agent-X.run

Where X is the specific version you want to install.

Installation Sequence

The installation of components must follow this sequence:

  1. Application server
  2. Metadata server
  3. Agents

Installing the Lumada Data Catalog packages

  1. Before proceeding to installing Data Catalog, make sure you have configured the service user following the steps mentioned in Configure the Data Catalog service user under Pre-installation Validation.
  2. And, that you have also configured the authentication for MapR as described in Configure the authentication method for MapR.

Install the application server

Follow the steps below to install the application server.

Procedure

  1. Run the wld-app-server with root user and the --no-exec arguments:

    $ ./wld-app-server-2019.3.run -- --no-exec
    
    Verifying archive integrity...  100%   MD5 checksums are OK. All good.
    Uncompressing Lumada Data Catalog App Server Installer  100%
    
    
    This program installs Data Catalog Application Server.
    
    Press ^C at any time to quit.
    
    --------------------------------------------------------------------
                   LUMADA DATA CATALOG APPLICATION SERVER INSTALLER
    --------------------------------------------------------------------
        1. Express Install          (Requires superuser access)
        2. Upgrade
        3. Exit
    
    Enter your choice [1-3]: 1
    Enter the name of the Lumada Data service user [ldcuser]:
    Enter install location [/opt/waterlinedata] :
    Enter log location [/var/log/waterlinedata] :
    Enter the Solr server version [7.5.0]:
    Is Kerberos enabled? [y/N]:
    Do you want to link hdfs-site.xml, hive-site.xml, core-site.xml to Data Catalog installation? [y/N] : N
    --------------------------------------------------------------------
       SELECTION SUMMARY
    --------------------------------------------------------------------
         Data Catalog service user : ldcuser
                    Install location : /opt/waterlinedata
                        Log location : /var/log/waterlinedata
                    Kerberos enabled : false
                 Solr server version : 7.5.0
                      Link site xmls : false
    Proceed? [Y/n]:
    [sudo] password for wlddev:
    Removed existing directory /opt/waterlinedata/app-server
    Directory /opt/waterlinedata exists.
    Created directory /opt/waterlinedata/app-server
    Directory /var/log/waterlinedata exists.
    Copying files ... done.
    
    Installed app-server to /opt/waterlinedata/app-server
    Generating certificate ...
            Certificate fingerprint (SHA-256): 7fbcda6dceaec042cd1f7681a17ac6a99cc70d0cd5e363eb056075c628852893
    Starting services ................ done.
  2. Select Express Install option and choose all default paths.

    1. When prompted for Kerberos, select N.

    2. When prompted for linking hdfs-site.xml, hive-site.xml, core-site.xml to Lumada Data Catalog, select N.

    Allow the installation to complete.
  3. Edit the WEBAPP_OPTS options in /opt/waterlinedata/app-server/bin/app-server to include the MapR libraries.

    $ vi /opt/waterlinedata/app-server/bin/app-server
    1. Search for WEBAPP_OPTS classpath.

    2. Add -Dmapr.library.flatclass=/opt/mapr/lib as follows:

      Update WEBAPP_OPTS
  4. Modify the script to disable loading of Jackson JARs by adding |jackson.* as indicated below in setupClasspathAndOptions function definition.

    Search for term function setupClasspathAndOptions. In this function definition add |jackson.* to the value of DEPENDENCIES_CP_SANITIZED as shown:

    Disable loading of jackson JARs

  5. Place a symlink to /opt/mapr/lib/log4j-1.2.17.jar into /opt/waterlinedata/app-server/ext

    $ ln -s /opt/mapr/lib/log4j-1.2.17.jar /opt/waterlinedata/app-server/ext
  6. Append MapR client login information into waterlinedata-jaas file. Run the following command and then remove Kerberos entries manually.

    $ cat /opt/mapr/conf/mapr.login.conf >> /opt/waterlinedata/conf/waterlinedata-jaas.conf
  7. Remove the Kerberos login credentials entry named Client from the waterlinedata-jaas.conf file, then remove the highlighted entry identified by module.WDKrb5LoginModule

    $ vi /opt/waterlinedata/conf/waterlinedata-jaas.conf

    Edit waterlinedata-jaas.conf file

  8. In the same waterlinedata-jaas.conf file, follow the steps below.

    1. Substitute the cluster name under MAPR_SERVER_KERBEROS definition as shown below:

      Substitute cluster name

    2. Substitute the FQDN value under MAPR_WEBSERVER_KERBEROS definition as shown below:

      Substitute FQDN value

  9. Create login credentials and start app-server in setup mode. Skip login credentials step if security is disabled and just start app-server in setup mode.

    MapR uses its own authentication system, which is very similar to Kerberos.

    $ sudo su - ldcuser
    $ maprlogin password
  10. Data Catalog packaged Hadoop and Hive jars will not work for MapR environment. We need to link the corresponding jars on MapR to the /opt/waterlinedata/app-server/ext path.

    1. Make a copy of the /opt/waterlinedata/app-server/ext folder.

      <WLD App-Server>$ cd /opt/waterlinedata/app-server && cp -r ext ext.orig
    2. Replace the Hadoop JARs with those from MapR environment as follows:

      CautionExact versions of these files may differ from system to system. Be sure to use the correct version available in your system.

      <WLD App-Server>$ rm ext/hadoop/hadoop-auth-2.x.x.jar
      <WLD App-Server>$ cp <path on mapr>/hadoop-auth-2.7.0.jar ext/hadoop
      
      <WLD App-Server>$ rm ext/hadoop/hadoop-common-2.x.x.jar
      <WLD App-Server>$ cp <path on mapr>/hadoop-common-2.7.0.jar ext/hadoop
      
      <WLD App-Server>$ rm ext/hadoop/maprfs-6.1.0-mapr.jar
      <WLD App-Server>$ cp <path on mapr>/maprfs-6.1.0-mapr.jar ext/hadoop
    3. Replace the Hive JARs with those from MapR environment as follows:

      CautionExact versions of these files may differ from system to system. Be sure to use the correct version available in your system.
      <WLD App-Server>$ cp /opt/mapr/hive/hive-2.3/lib/hive-common-2.3.3-mapr-1901.jar ext/hive
      <WLD App-Server>$ cp /opt/mapr/hive/hive-2.3/lib/hive-exec-2.3.3-mapr-1901.jar ext/hive
      
      <WLD App-Server>$ cp /opt/mapr/hive/hive-2.3/lib/hive-jdbc-2.3.3-mapr-1901.jar ext/hive
      <WLD App-Server>$ cp /opt/mapr/hive/hive-2.3/lib/hive-metastore-2.3.3-mapr-1901.jar ext/hive
      
      <WLD App-Server>$ cp /opt/mapr/hive/hive-2.3/lib/hive-service-2.3.3-mapr-1901.jar ext/hive
      <WLD App-Server>$ cp /opt/mapr/hive/hive-2.3/lib/libthrift-0.9.3.jar ext/hive
    4. Copy or link the following Hive and Hadoop files to the Data Catalog install directory /opt/waterlinedata/app-server/conf.

      <WLD App-Server>$ ln -s /opt/mapr/hive/hive-2.3/conf.new/hive-site.xml conf/ <WLD App-Server>$ ln
            -s /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/hdfs-site.xml conf/ <WLD App-Server>$ ln -s
            /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml conf/ 
    5. Make sure SPARK_HOME and HIVE_HOME are defined correctly in the ~/.bash_profile script of ldsuser and root user.

      Data Catalog user needs spark-submit as ENV variable to run Data Catalog job. Root user needs beeline as an ENV variable for running installer.
    6. Add ldcuser user to HiveServer2 node (if not already available).

      Data Catalog service user should be available on all the MapR nodes.
  11. Restart Web application in setup mode to load the above environment specific jars.

    $ /opt/waterlinedata/app-server/bin/app-server start --setup
  12. Complete the setup as follows:

    Welcome screen

    License screen

    Connect with Solr

    Connect with Postgres

    Large properties storage

    Repository bootstrap

    Authentication method

    Metadata REST server details

    Restart screen

Next steps

Metadata REST server details page

The Application server installation will automatically create a token for the Metadata REST Server, which will be used for initializing and registering the Metadata Server with the application server.

Copy the Metadata Server installation command for later reference but do not execute it yet. We will need this information when installing the Metadata Rest Server.

NoteThe same metadata server token shown in the last step 7 of Installation process can also be obtained from the UI after restarting the application server under Manage Tokens metadata-rest-server and select Install Metadata Rest Server.

Install the Metadata server

The metadata server installation command is auto-generated by the app-server installer for convenient installation of the metadata server.

Procedure

  1. Restart the app-server.

  2. Execute the following command on the node where you want to install the metadata server:

    ./wld-metadata-server-5.1.run -- --init --endpoint ayro:8082 \
    --client-id metadata-rest-server \
    --token 4236cea0-93ad-416d-9b38-919392ac6059 \ 
    --public-host ayro \
    --port 4242

    where:

    • --init initializes (syncs repository configuration from App-Server).
    • --endpoint is the app-server URL to connect to.
    • --token is the authentication token.
    • --public-host is the public host of the metadata server to be reported to agents when they subsequently register.
    • --port is the port on which to run.

    NoteOn some MapR environments, port 4242 may occupied and unavailable for Data Catalog. In these cases provide a different port (say 4244) in the Metadata Server installation command above.

Install Agents

Follow the steps below to install Agents.

Procedure

  1. In the Manage Agents section of the application, create a new Agent:

    Create Agent dialog box

  2. Copy the command generated.

    Register agent

  3. Use the copied command to install the agent on as follows.

    ./wld-agent-5.1.run -- --register --endpoint ayro:8082 --agent-id radf0e60f224ad436e --agent-token c6cd59db-6225-4698-9dd5-ac12f5d5e434

Building your Data Catalog

Now that Data Catalog is installed and running, the next step is to connect to the data you want to include in the catalog. For information on how to create a data source, see Managing Data Sources.

NoteWhen adding a data source on MapR, make sure that the HDFS connection URL reflects maprfs:///