HBase setup for Spark
The HBase Input and HBase Output steps can run on Spark with the Adaptive Execution Layer (AEL). These steps can be used with the supported versions of Cloudera Distribution for Hadoop (CDH) and Hortonworks Data Platform (HDP). To read or write data to HBase, you must have an HBase target table on the cluster. If one does not exist, you can create one using HBase shell commands.
This article explains how you can set up the Pentaho Server to run these steps.
Set up the application properties file
Perform the following steps to set up the application.properties file:
Procedure
Navigate to the design-tools/data-integration/adaptive-execution/config folder and open the application.properties file with any text editor.
Set the value of the hbaseConfDir property to the location of your hbase-site.xml file.
Set the value of the extraLib property to the location of the vendor-specific JARs.
The default value is ./extra.Save and close the file.
Set up the vendor-specified JARs
Perform the following steps to set up the vendor-specific JARs:
Procedure
Navigate to the design-tools/data-integration/adaptive-execution/extra directory and delete the three hbase JAR files.
Navigate to the design-tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations directory and locate your CDH or HDP distribution folder.
Locate the lib/pmr directory in your distribution folder.
Copy the six hbase files, along with the metrics-core file to the design-tools/data-integration/adaptive-execution/extra folder.
To complete your setup, you must restart the AEL daemon.