Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Installing Lumada Data Catalog on MapR

Parent article

Usethe following steps to install and configure Lumada Data Catalog in your MapR environment.

System requirements

Data Catalog supports the following configuration on MapR:

DistributionComponentsVersionsNotes
MEPMapR6.1.0
Apache Spark™2.4.0
MapRSolr8.4.1Installed separately
EcosystemMapR-FS6.1.0.20180926230239.GA
PackHIVE2.3.3
Postgres11.9Installed separately
AtlasNA

For memory requirements, see Minimum node requirements.

Sizing estimates

If you plan to install Data Catalog and Solr on the same node, the node should have at least 64 GB RAM. Alternatively, configure Data Catalog and Solr on separate nodes in the same cluster.

Preparation

Before you begin installing Data Catalog, make sure you have read and followed the pre-installation validations for your environment, specifically:

Configure the authentication method for MapR

Use the following steps to configure the authentication for MapR, including generating a ticket for the Lumada Data Catalog service user:

Procedure

  1. For the Lumada Data Catalog service user, create a ticket with service impersonation propertiest that is valid for an extended duration, such as a year. Refer to the following code sample:

    LDC Service User Home$ maprlogin  generateticket  -type servicewithimpersonation \
                                                        -user ldcuser \
                                                        -out ldcuser_ticket \
                                                        -duration 365:0:0
    In this example, a ticket file with the name ldcuser_ticket is created, which is valid for 365 days.
  2. Write the MapR credentials of the ldcuser for the cluster maprdemo.waterlinedata.com to the secure_ldcuser_ticket.

    LDC Service User Home$ ls secure_ldcuser_ticket
    .
  3. Edit the .bash_profile file with any text editor to configure the MAPR_TICKETFILE_LOCATION environment variable to allow ticket generation, as in the following example:

    LDC Service User Home$ vi ~/.bash_profile
    
    # .bash_profile
    
    # Get the aliases and functions
    if [ -f ~/.bashrc ]; then
            . ~/.bashrc
    fi
    
    # User specific environment and startup programs
    
    PATH=$PATH:$HOME/.local/bin:$HOME/bin
    export MAPR_TICKETFILE_LOCATION=/home/ldcuser/secure_ldcuser_ticket
    
    export PATH
  4. Verify that Beeline is part of the classpath by running the Beeline command at the prompt, as follows:

    LDC Service User Home

    If you get an error that indicates the command is not found, then you must, as a root user, add Beeline to the classpath by creating a symlink or soft link using the following command:

    LDC Service User Home$ ln -s /opt/mapr/hive/hive-2.3/bin/beeline /usr/bin/beeline
  5. Edit the .bash_profile file with any text editor to define the path variables for HIVE_HOME and SPARK_HOME as shown in the following example:

    LDC Service User Home$ vi ~/.bash_profile
    
    # .bash_profile
    
    # Get the aliases and functions
    if [ -f ~/.bashrc ]; then
            . ~/.bashrc
    fi
    
    # User specific environment and startup programs
    
    export HIVE_HOME=/opt/mapr/hive/<hive-version>
    export SPARK_HOME=/opt/mapr/spark/<spark-version>
    
    PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HIVE_HOME/bin:$SPARK_HOME/bin
    export PATH

Download the Data Catalog packages

Download the Data Catalog distribution from the location provided by Data Catalog. If your organization has subscribed to support, you can find the location through the Data Catalog Support portal at Hitachi Vantara Lumada and Pentaho Support Portal.

Obtain access to the following three installers. Note that X indicates the specific version that you want to install.

  • ldc-app-server-X.run
  • ldc-metadata-server-X.run
  • ldc-agent-X.run

To optimize your success, install the components in the following order:

  1. LDC Application Server
  2. LDC Metadata Server
  3. LDC Agent

Installing the Lumada Data Catalog packages

Before installing the Data Catalog packages, make sure you have configured the service user by following the steps in Configure the Data Catalog service user.

Also, verify that you have configured the authentication for MapR as described in Configure the authentication method for MapR.

Install the LDC Application Server

Follow the steps below to install the LDC Application Server.

Procedure

  1. Run the ldc-app-server as a root user using the --no-exec argument as in the following command:

    $ ./ldc-app-server-6.0.0.run -- --no-exec
    The following text displays in the Terminal window:

    Verifying archive integrity...  100%   MD5 checksums are OK. All good.
    Uncompressing Lumada Data Catalog App Server Installer  100%
    
    
    This program installs Data Catalog Application Server.
    
    Press ^C at any time to quit.
    
    --------------------------------------------------------------------
                   LUMADA DATA CATALOG APPLICATION SERVER INSTALLER
    --------------------------------------------------------------------
        1. Express Install          (Requires superuser access)
        2. Upgrade
        3. Exit
    
    Enter your choice [1-3]: 1
    Enter the name of the Lumada Data service user [ldcuser]:
    Enter install location [/opt/ldc] :
    Enter log location [/var/log/ldc] :
    Enter the Solr server version [7.5.0]:
    Is Kerberos enabled? [y/N]:
    Do you want to link hdfs-site.xml, hive-site.xml, core-site.xml to Data Catalog installation? [y/N] : N
    --------------------------------------------------------------------
       SELECTION SUMMARY
    --------------------------------------------------------------------
         Data Catalog service user : ldcuser
                    Install location : /opt/ldc
                        Log location : /var/log/ldc
                    Kerberos enabled : false
                 Solr server version : 8.4.1
                      Link site xmls : false
    Proceed? [Y/n]:
    [sudo] password for ldcuser:
    Removed existing directory /opt/ldc/app-server
    Directory /opt/ldc exists.
    Created directory /opt/ldc/app-server
    Directory /var/log/ldc exists.
    Copying files ... done.
    
    Installed app-server to /opt/ldc/app-server
    Generating certificate ...
            Certificate fingerprint (SHA-256): 7fbcda6dceaec042cd1f7681a17ac6a99cc70d0cd5e363eb056075c628852893
    Starting services ................ done.
  2. Enter 1 (Express Install) and choose all the default paths.

    1. When prompted for Kerberos, enter N.

    2. When prompted for linking hdfs-site.xml, hive-site.xml, core-site.xml to Lumada Data Catalog, enter N.

    3. When prompted to Proceed, enter Y.

    The installation completes.
  3. Edit the /opt/ldc/app-server/bin/app-server file to add -Dmapr.library.flatclass=/opt/mapr/lib to the WEBAPP_OPTS classpath, as in the following code example:

    WEBAPP_OPTS="-Dwaterlinedata.webapp.war=${WATERLINEDATA_WEBAPP_WAR} \
        -Dwaterlinedata.webapp.extra.classpath=${EXTRA_CLASSPATH} \
        -Dwaterlinedata.webapp.override.descriptor=${JETTY_BASE}/etc/waterlinedata-override-descriptor.xml \
        -Dmapr.library.flatclass=/opt/mapr/lib \
        -Dwaterlinedata.plugins.dir=${PLUGINS_DIR} \
        -Dwaterlinedata.home=${WATERLINEDATA_INSTALL_DIR} \
        -Dwaterlinedata.shiro.global.timeout=1200000 
        -Dwaterlinedata.setup.mode=${SETUP_MODE}"
  4. Modify the script to disable loading of Jackson JARs by adding |jackson.* as indicated below in the setupClasspathAndOptions function definition.

    Search for the term function setupClasspathAndOptions. In this function definition add |jackson.* to the value of DEPENDENCIES_CP_SANITIZED as in the following example:

    function setupClasspathAndOptions() {
        if [[ -n "${TRUST_STORE_FILE>" ]];then
            TRUST_STORE_ARGS="-Djavax.net.ssl.trustStore=${TRUST_STORE_FILE>"
        fi
    
        LOGGING_CP="${WATERLINEDATA_INSTALL_DIR}/lib/logging/*"
        LOGGING_CP_EXPANDED=$(cd ${BIN_DIR} && java -cp "${WATERLINEDATA_INSTALL_DIR}/lib/waterlinedata/waterlinedata-execution-cli-${WLD_VERSION}.jar:${LOGGING_CP}" Globber) 
        LOGGING_CP_SANITIZED=$(echo "${LOGGING_CP_EXPANDED}" | tr ':' ',')
    	
        EXTRA CLASSPATH=
    
        DEPENDENCIES_CP_COLLAPSED=${WATERLINEDATA_INSTALL_DIR}/lib/dependencies/*
        DEPENDENCIES_CP_EXPANDED=$(java -cp "${WATERLINEDATA_INSTALL_DIR}/lib/waterlinedata/waterlinedata-execution-cli-${WLD_VERSION}.jar:${DEPENDENCIES_CP_COLLAPSED}" Globber) 
        DEPENDENCIES_CP_SANITIZED=$(echo "${DEPENDENCIES_CP_EXPANDED}" | tr ':' '\n' | grep -E -v 'jersey.*1.9|jackson.*|oauth2-oidc-sdk.*|nimbus-jose-jwt.*|log4j.*|deltaspike.*|weld.*|.\*\.jar' | tr '\n' ',' ) 2>&I > /dev/null
  5. Create a symlink to the /opt/mapr/lib/log4j-1.2.17.jar in the /opt/ldc/app-server/ext directory using the following command:

    $ ln -s /opt/mapr/lib/log4j-1.2.17.jar /opt/ldc/app-server/ext
  6. Edit the ldc-jaas file with the following steps:

    1. Append MapR client login information into the ldc-jaas file by running the following command:

    2. Remove all Kerberos entries from the ldc-jaas file.

    3. Remove the Kerberos login credentials entry named Client from the ldc-jaas.conf file.

    4. Edit the Client entry to remove .WDKrb5LoginModule as in the following example:

      Client {
         com.waterlinedata.security.auth.module.WDKrb5LoginModule required 
         useKeyTab = true 
         useTicketCache = false 
         debug = false
         keyTabName = "waterlinesvc.keytab" 
         principal = "waterlinesvc@WATERLINEDATA.COM”;
    5. Change the principal value in the MAPR_SERVER_KERBEROS definition to the name of your cluster. For example, principal="mapr/<your cluster name>";

    6. Change the principal value in the MAPR_WEBSERVER_KERBEROS definition to your fully qualified domain name.

  7. Start the LDC Application Server in setup mode. If security is enabled, you must create login credentials before you start the LDC Application Server in setup mode. MapR uses its own authentication system, which is similar to Kerberos.

    $ sudo su - ldcuser
    $ maprlogin password
    
    $ cat /opt/mapr/conf/mapr.login.conf >> /opt/ldcdata/conf/ldcdata-jaas.conf
  8. The Hadoop and Hive JARs included in the Data Catalog package do not work for the MapR environment. You must link the corresponding JARs on MapR to the /opt/ldc/app-server/ext path as follows:

    1. Make a copy of the /opt/ldc/app-server/ext folder:

      <APP-SERVER-HOME>$ cd /opt/ldc/app-server && cp -r ext ext.orig
    2. Replace the Hadoop JARs with those from the MapR environment as follows:

      CautionExact versions of these files may differ from system to system. Be sure to use the correct version available in your system.

      <APP-SERVER-HOME>$ rm ext/hadoop/hadoop-auth-2.x.x.jar
      <APP-SERVER-HOME>$ cp <path on mapr>/hadoop-auth-2.7.0.jar ext/hadoop
      
      <APP-SERVER-HOME>$ rm ext/hadoop/hadoop-common-2.x.x.jar
      <APP-SERVER-HOME>$ cp <path on mapr>/hadoop-common-2.7.0.jar ext/hadoop
      
      <APP-SERVER-HOME>$ rm ext/hadoop/maprfs-6.1.0-mapr.jar
      <APP-SERVER-HOME>$ cp <path on mapr>/maprfs-6.1.0-mapr.jar ext/hadoop
      
    3. Replace the Hive JARs with those from the MapR environment as follows:

      CautionExact versions of these files may differ from system to system. Be sure to use the correct version available in your system.
      <APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-common-2.3.3-mapr-1901.jar ext/hive
      <APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-exec-2.3.3-mapr-1901.jar ext/hive
      
      <APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-jdbc-2.3.3-mapr-1901.jar ext/hive
      <APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-metastore-2.3.3-mapr-1901.jar ext/hive
      
      <APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-service-2.3.3-mapr-1901.jar ext/hive
      <APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/libthrift-0.9.3.jar ext/hive
      
    4. Copy or link the following Hive and Hadoop files to the Data Catalog install directory /opt/ldc/app-server/conf.

      <APP-SERVER-HOME>$ ln -s /opt/mapr/hive/hive-2.3/conf.new/hive-site.xml conf/
      <APP-SERVER-HOME>$ ln -s /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/hdfs-site.xml conf/
      <APP-SERVER-HOME>$ ln -s /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml conf/
      
    5. Copy the packaged hive-serde JAR from /opt/ldc/app-server/ext to the MapR auxlib directory as follows:

      <APP-SERVER-HOME>$ cp ext/hive-serde-1.0.1.jar
              /opt/mapr/hive/hive-2.3/auxlib/
    6. Make sure that the SPARK_HOME and HIVE_HOME values are defined correctly in the ~/.bash_profile script of ldsuser and root user.

    7. Make sure that the Data Catalog user has the spark-submit ENV variable in order to run Data Catalog jobs.

    8. Make sure that the root user has the Beeline ENV variable in order to run installer.

    9. Add ldcuser user to the HiveServer2 node (if not already available).

      The Data Catalog service user should be available on all the MapR nodes.

  9. Restart the Web application in setup mode to load the environment-specific JARs, using the following command:

    $ /opt/ldc/app-server/bin/app-server start –setup
  10. Complete the setup as follows:

    1. Click Let's get started on the Setup screen. The setup wizard opens.

    2. Review the license agreement and select the check box to acknowledge that you have read and agee to the terms. Click I agree to accept the licensing agreement.

    3. Continue in the setup wizard, entering the required information at the prompts.

  11. On the last step of the setup wizard, copy the Metadata Server installation command from the Metadata REST server details page for later reference, but do not execute it yet. Then click Next step.

    You need this information when installing the Metadata Server. The LDC Application Server installation automatically creates a token for the LDC Metadata Server, which is used for initializing and registering the Metadata Server with the Application Server.

    Metadata REST server details

    NoteThe same Metadata Server token, shown above, can also be obtained from the user interface after restarting the Application Server, and then by selecting Install Metadata Rest Server, under Manage Tokens metadata-rest-server, and clicking Install Metadata Rest Server.
    The Restart page appears.
  12. Click Restart to apply the changes.

    The Welcome page appears.

Next steps

Install the LDC Metadata Server

The metadata server installation command is automatically generated by the LDC Application Server installer for convenient installation of the LDC Metadata Server.

Perform the following steps to install the LDC Metadata Server:

Procedure

  1. Restart the Application Server.

  2. Execute the following command on the node where you want to install the Metadata Server:

    ./ldc-metadata-server-6.0.0.run -- --init --endpoint ayro:8082 \
    --client-id metadata-rest-server \
    --token 4236cea0-93ad-416d-9b38-919392ac6059 \ 
    --public-host ayro \
    --port 4242

    Refer to the following list for a description of each argument:

    • --init

      Initialize: synchronize the repository configuration from the LDC Application Server.

    • --endpoint

      The URL of the LDC Application Server you want to connect to.

    • --token

      Authentication token.

    • --public-host

      Public host of the LDC Metadata Server to be reported to the LDC Agent when it subsequently registers. "Public" does not necessarily mean the internet facing public hostname/IP. It only means the hostname/IP that is routable from all the LDC Agent. If all the LDC Agents are part of a private subnet, then enter the private hostname/IP of the LDC Metadata Server host.

    • --port

      Port on which to run.

    NoteOn some MapR environments, port 4242 may be used for Data Catalog. In this case, you must provide a different port (for example, 4244) in the metadata server installation command.
    The LDC Metadata Server installer is verified and extracted.

Next steps

Proceed to Install LDC Agent.

Install LDC Agent

Follow the steps below to install a new LDC Agent.

Procedure

  1. In Lumada Data Catalog, navigate to Manage Agents.

    The Agents page opens.
  2. Click Create Agent.

    The Create Agent dialog box opens.
  3. In the Name field enter a name and the Description field enter a description for the Agent, then click Add.

    Create Agent dialog box

    The Register Agent dialog box opens.
  4. Run the copied command to install the LDC Agent as follows:

    ./ldc-agent-6.0.0.run -- --register --endpoint ayro:8082 --agent-id radf0e60f224ad436e --agent-token c6cd59db-6225-4698-9dd5-ac12f5d5e434

    NoteAgent installation requires the Hive version in the distribution as an input, which dictates the install-time Hive specific configuration of the LDC Agent.

Building your Data Catalog

Now that Data Catalog is installed and running, the next step is to connect to the data you want to include in the catalog. For information on how to create a data source, see Managing Data Sources.

NoteWhen adding a data source on MapR, make sure that the HDFS connection URL reflects maprfs:///