Installing Lumada Data Catalog on MapR

System requirements

Data Catalog supports the following configuration on MapR:

Distribution	Components	Versions	Notes
MEP	MapR	6.1.0
MEP	Apache Spark™	2.4.0
MapR	Solr	8.4.1	Installed separately
Ecosystem	MapR-FS	6.1.0.20180926230239.GA
Pack	HIVE	2.3.3
	Postgres	11.9	Installed separately
	Atlas	NA

For memory requirements, see Minimum node requirements.

Sizing estimates

If you plan to install Data Catalog and Solr on the same node, the node should have at least 64 GB RAM. Alternatively, configure Data Catalog and Solr on separate nodes in the same cluster.

Preparation

Before you begin installing Data Catalog, make sure you have read and followed the pre-installation validations for your environment, specifically:

Configure the authentication method for MapR

Use the following steps to configure the authentication for MapR, including generating a ticket for the Lumada Data Catalog service user:

Procedure

For the Lumada Data Catalog service user, create a ticket with service impersonation propertiest that is valid for an extended duration, such as a year. Refer to the following code sample:

LDC Service User Home$ maprlogin  generateticket  -type servicewithimpersonation \
                                                    -user ldcuser \
                                                    -out ldcuser_ticket \
                                                    -duration 365:0:0

In this example, a ticket file with the name ldcuser_ticket is created, which is valid for 365 days.

Write the MapR credentials of the ldcuser for the cluster maprdemo.waterlinedata.com to the secure_ldcuser_ticket.
```
LDC Service User Home$ ls secure_ldcuser_ticket
```
.

Edit the .bash_profile file with any text editor to configure the MAPR_TICKETFILE_LOCATION environment variable to allow ticket generation, as in the following example:

LDC Service User Home$ vi ~/.bash_profile

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/.local/bin:$HOME/bin
export MAPR_TICKETFILE_LOCATION=/home/ldcuser/secure_ldcuser_ticket

export PATH

Verify that Beeline is part of the classpath by running the Beeline command at the prompt, as follows:
```
LDC Service User Home
```
If you get an error that indicates the command is not found, then you must, as a root user, add Beeline to the classpath by creating a symlink or soft link using the following command:
```
LDC Service User Home$ ln -s /opt/mapr/hive/hive-2.3/bin/beeline /usr/bin/beeline
```

Edit the .bash_profile file with any text editor to define the path variables for HIVE_HOME and SPARK_HOME as shown in the following example:

LDC Service User Home$ vi ~/.bash_profile

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

export HIVE_HOME=/opt/mapr/hive/<hive-version>
export SPARK_HOME=/opt/mapr/spark/<spark-version>

PATH=$PATH:$HOME/.local/bin:$HOME/bin:$HIVE_HOME/bin:$SPARK_HOME/bin
export PATH

Download the Data Catalog packages

Download the Data Catalog distribution from the location provided by Data Catalog. If your organization has subscribed to support, you can find the location through the Data Catalog Support portal at Hitachi Vantara Lumada and Pentaho Support Portal.

Obtain access to the following three installers. Note that X indicates the specific version that you want to install.

ldc-app-server-X.run
ldc-metadata-server-X.run
ldc-agent-X.run

To optimize your success, install the components in the following order:

LDC Application Server
LDC Metadata Server
LDC Agent

Installing the Lumada Data Catalog packages

Before installing the Data Catalog packages, make sure you have configured the service user by following the steps in Configure the Data Catalog service user.

Also, verify that you have configured the authentication for MapR as described in Configure the authentication method for MapR.

Install the LDC Application Server

Follow the steps below to install the LDC Application Server.

Procedure

Run the ldc-app-server as a root user using the --no-exec argument as in the following command:

$ ./ldc-app-server-6.0.0.run -- --no-exec

The following text displays in the Terminal window:

Verifying archive integrity... 100% MD5 checksums are OK. All good.
Uncompressing Lumada Data Catalog App Server Installer 100%

This program installs Data Catalog Application Server.

Press ^C at any time to quit.

--------------------------------------------------------------------
LUMADA DATA CATALOG APPLICATION SERVER INSTALLER
--------------------------------------------------------------------
1. Express Install (Requires superuser access)
2. Upgrade
3. Exit

Enter your choice [1-3]: 1
Enter the name of the Lumada Data service user [ldcuser]:
Enter install location [/opt/ldc] :
Enter log location [/var/log/ldc] :
Enter the Solr server version [7.5.0]:
Is Kerberos enabled? [y/N]:
Do you want to link hdfs-site.xml, hive-site.xml, core-site.xml to Data Catalog installation? [y/N] : N
--------------------------------------------------------------------
SELECTION SUMMARY
--------------------------------------------------------------------
Data Catalog service user : ldcuser
Install location : /opt/ldc
Log location : /var/log/ldc
Kerberos enabled : false
Solr server version : 8.4.1
Link site xmls : false
Proceed? [Y/n]:
[sudo] password for ldcuser:
Removed existing directory /opt/ldc/app-server
Directory /opt/ldc exists.
Created directory /opt/ldc/app-server
Directory /var/log/ldc exists.
Copying files ... done.

Installed app-server to /opt/ldc/app-server
Generating certificate ...
Certificate fingerprint (SHA-256): 7fbcda6dceaec042cd1f7681a17ac6a99cc70d0cd5e363eb056075c628852893
Starting services ................ done.

Enter 1 (Express Install) and choose all the default paths.
1. When prompted for Kerberos, enter N.
2. When prompted for linking hdfs-site.xml, hive-site.xml, core-site.xml to Lumada Data Catalog, enter N.
3. When prompted to Proceed, enter Y.
The installation completes.

Edit the /opt/ldc/app-server/bin/app-server file to add -Dmapr.library.flatclass=/opt/mapr/lib to the WEBAPP_OPTS classpath, as in the following code example:

WEBAPP_OPTS="-Dwaterlinedata.webapp.war=${WATERLINEDATA_WEBAPP_WAR} \
    -Dwaterlinedata.webapp.extra.classpath=${EXTRA_CLASSPATH} \
    -Dwaterlinedata.webapp.override.descriptor=${JETTY_BASE}/etc/waterlinedata-override-descriptor.xml \
    -Dmapr.library.flatclass=/opt/mapr/lib \
    -Dwaterlinedata.plugins.dir=${PLUGINS_DIR} \
    -Dwaterlinedata.home=${WATERLINEDATA_INSTALL_DIR} \
    -Dwaterlinedata.shiro.global.timeout=1200000 
    -Dwaterlinedata.setup.mode=${SETUP_MODE}"

Modify the script to disable loading of Jackson JARs by adding |jackson.* as indicated below in the setupClasspathAndOptions function definition.

Search for the term function setupClasspathAndOptions. In this function definition add |jackson.* to the value of DEPENDENCIES_CP_SANITIZED as in the following example:

function setupClasspathAndOptions() {
    if [[ -n "${TRUST_STORE_FILE>" ]];then
        TRUST_STORE_ARGS="-Djavax.net.ssl.trustStore=${TRUST_STORE_FILE>"
    fi

    LOGGING_CP="${WATERLINEDATA_INSTALL_DIR}/lib/logging/*"
    LOGGING_CP_EXPANDED=$(cd ${BIN_DIR} && java -cp "${WATERLINEDATA_INSTALL_DIR}/lib/waterlinedata/waterlinedata-execution-cli-${WLD_VERSION}.jar:${LOGGING_CP}" Globber) 
    LOGGING_CP_SANITIZED=$(echo "${LOGGING_CP_EXPANDED}" | tr ':' ',')
	
    EXTRA CLASSPATH=

    DEPENDENCIES_CP_COLLAPSED=${WATERLINEDATA_INSTALL_DIR}/lib/dependencies/*
    DEPENDENCIES_CP_EXPANDED=$(java -cp "${WATERLINEDATA_INSTALL_DIR}/lib/waterlinedata/waterlinedata-execution-cli-${WLD_VERSION}.jar:${DEPENDENCIES_CP_COLLAPSED}" Globber) 
    DEPENDENCIES_CP_SANITIZED=$(echo "${DEPENDENCIES_CP_EXPANDED}" | tr ':' '\n' | grep -E -v 'jersey.*1.9|jackson.*|oauth2-oidc-sdk.*|nimbus-jose-jwt.*|log4j.*|deltaspike.*|weld.*|.\*\.jar' | tr '\n' ',' ) 2>&I > /dev/null

Create a symlink to the /opt/mapr/lib/log4j-1.2.17.jar in the /opt/ldc/app-server/ext directory using the following command:
```
$ ln -s /opt/mapr/lib/log4j-1.2.17.jar /opt/ldc/app-server/ext
```
Edit the ldc-jaas file with the following steps:
1. Append MapR client login information into the ldc-jaas file by running the following command:
2. Remove all Kerberos entries from the ldc-jaas file.
3. Remove the Kerberos login credentials entry named Client from the ldc-jaas.conf file.
4. Edit the Client entry to remove .WDKrb5LoginModule as in the following example:
```
Client {
   com.waterlinedata.security.auth.module.WDKrb5LoginModule required 
   useKeyTab = true 
   useTicketCache = false 
   debug = false
   keyTabName = "waterlinesvc.keytab" 
   principal = "waterlinesvc@WATERLINEDATA.COM”;
```
5. Change the principal value in the MAPR_SERVER_KERBEROS definition to the name of your cluster. For example, principal="mapr/<your cluster name>";
6. Change the principal value in the MAPR_WEBSERVER_KERBEROS definition to your fully qualified domain name.
Start the LDC Application Server in setup mode. If security is enabled, you must create login credentials before you start the LDC Application Server in setup mode. MapR uses its own authentication system, which is similar to Kerberos.
```
$ sudo su - ldcuser
$ maprlogin password
```
$ cat /opt/mapr/conf/mapr.login.conf >> /opt/ldcdata/conf/ldcdata-jaas.conf

The Hadoop and Hive JARs included in the Data Catalog package do not work for the MapR environment. You must link the corresponding JARs on MapR to the /opt/ldc/app-server/ext path as follows:

Make a copy of the /opt/ldc/app-server/ext folder:

<APP-SERVER-HOME>$ cd /opt/ldc/app-server && cp -r ext ext.orig

Replace the Hadoop JARs with those from the MapR environment as follows:

CautionExact versions of these files may differ from system to system. Be sure to use the correct version available in your system.

<APP-SERVER-HOME>$ rm ext/hadoop/hadoop-auth-2.x.x.jar
<APP-SERVER-HOME>$ cp <path on mapr>/hadoop-auth-2.7.0.jar ext/hadoop

<APP-SERVER-HOME>$ rm ext/hadoop/hadoop-common-2.x.x.jar
<APP-SERVER-HOME>$ cp <path on mapr>/hadoop-common-2.7.0.jar ext/hadoop

<APP-SERVER-HOME>$ rm ext/hadoop/maprfs-6.1.0-mapr.jar
<APP-SERVER-HOME>$ cp <path on mapr>/maprfs-6.1.0-mapr.jar ext/hadoop

Replace the Hive JARs with those from the MapR environment as follows:

CautionExact versions of these files may differ from system to system. Be sure to use the correct version available in your system.

<APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-common-2.3.3-mapr-1901.jar ext/hive
<APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-exec-2.3.3-mapr-1901.jar ext/hive

<APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-jdbc-2.3.3-mapr-1901.jar ext/hive
<APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-metastore-2.3.3-mapr-1901.jar ext/hive

<APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/hive-service-2.3.3-mapr-1901.jar ext/hive
<APP-SERVER-HOME>$ cp /opt/mapr/hive/hive-2.3/lib/libthrift-0.9.3.jar ext/hive

Copy or link the following Hive and Hadoop files to the Data Catalog install directory /opt/ldc/app-server/conf.

<APP-SERVER-HOME>$ ln -s /opt/mapr/hive/hive-2.3/conf.new/hive-site.xml conf/
<APP-SERVER-HOME>$ ln -s /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/hdfs-site.xml conf/
<APP-SERVER-HOME>$ ln -s /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml conf/

Copy the packaged hive-serde JAR from /opt/ldc/app-server/ext to the MapR auxlib directory as follows:
```
<APP-SERVER-HOME>$ cp ext/hive-serde-1.0.1.jar
        /opt/mapr/hive/hive-2.3/auxlib/
```
Make sure that the SPARK_HOME and HIVE_HOME values are defined correctly in the ~/.bash_profile script of ldsuser and root user.
Make sure that the Data Catalog user has the spark-submit ENV variable in order to run Data Catalog jobs.
Make sure that the root user has the Beeline ENV variable in order to run installer.
Add ldcuser user to the HiveServer2 node (if not already available).

The Data Catalog service user should be available on all the MapR nodes.

Restart the Web application in setup mode to load the environment-specific JARs, using the following command:
$ /opt/ldc/app-server/bin/app-server start –setup
Complete the setup as follows:
1. Click Let's get started on the Setup screen. The setup wizard opens.
2. Review the license agreement and select the check box to acknowledge that you have read and agee to the terms. Click I agree to accept the licensing agreement.
3. Continue in the setup wizard, entering the required information at the prompts.
On the last step of the setup wizard, copy the Metadata Server installation command from the Metadata REST server details page for later reference, but do not execute it yet. Then click Next step.

You need this information when installing the Metadata Server. The LDC Application Server installation automatically creates a token for the LDC Metadata Server, which is used for initializing and registering the Metadata Server with the Application Server.

NoteThe same Metadata Server token, shown above, can also be obtained from the user interface after restarting the Application Server, and then by selecting Install Metadata Rest Server, under Manage Tokens metadata-rest-server, and clicking Install Metadata Rest Server.
The Restart page appears.
Click Restart to apply the changes.
The Welcome page appears.

Next steps

Proceed to Install the LDC Metadata Server.

Install the LDC Metadata Server

The metadata server installation command is automatically generated by the LDC Application Server installer for convenient installation of the LDC Metadata Server.

Perform the following steps to install the LDC Metadata Server:

Procedure

Restart the Application Server.
Execute the following command on the node where you want to install the Metadata Server:
```
./ldc-metadata-server-6.0.0.run -- --init --endpoint ayro:8082 \
--client-id metadata-rest-server \
--token 4236cea0-93ad-416d-9b38-919392ac6059 \ 
--public-host ayro \
--port 4242
```
Refer to the following list for a description of each argument:
- --init
  Initialize: synchronize the repository configuration from the LDC Application Server.
- --endpoint
  The URL of the LDC Application Server you want to connect to.
- --token
  Authentication token.
- --public-host
  Public host of the LDC Metadata Server to be reported to the LDC Agent when it subsequently registers. "Public" does not necessarily mean the internet facing public hostname/IP. It only means the hostname/IP that is routable from all the LDC Agent. If all the LDC Agents are part of a private subnet, then enter the private hostname/IP of the LDC Metadata Server host.
- --port
  Port on which to run.
NoteOn some MapR environments, port 4242 may be used for Data Catalog. In this case, you must provide a different port (for example, 4244) in the metadata server installation command.
The LDC Metadata Server installer is verified and extracted.

Next steps

Proceed to Install LDC Agent.

Install LDC Agent

Follow the steps below to install a new LDC Agent.

Procedure

In Lumada Data Catalog, navigate to Manage Agents.
The Agents page opens.
Click Create Agent.
The Create Agent dialog box opens.
In the Name field enter a name and the Description field enter a description for the Agent, then click Add.

The Register Agent dialog box opens.
Run the copied command to install the LDC Agent as follows:
```
./ldc-agent-6.0.0.run -- --register --endpoint ayro:8082 --agent-id radf0e60f224ad436e --agent-token c6cd59db-6225-4698-9dd5-ac12f5d5e434
```
NoteAgent installation requires the Hive version in the distribution as an input, which dictates the install-time Hive specific configuration of the LDC Agent.

Building your Data Catalog

Now that Data Catalog is installed and running, the next step is to connect to the data you want to include in the catalog. For information on how to create a data source, see Managing Data Sources.

NoteWhen adding a data source on MapR, make sure that the HDFS connection URL reflects maprfs:///

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com.

System requirements

Sizing estimates

Preparation

Configure the authentication method for MapR

Download the Data Catalog packages

Installing the Lumada Data Catalog packages

Install the LDC Application Server

Install the LDC Metadata Server

Install LDC Agent

Building your Data Catalog