Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Docker container deployment of Pentaho

Parent article

This guide walks you through setting up and deploying Pentaho products within complete, cloud-ready Docker containers. You can use your Pentaho Server, based on your licensed Pentaho software, to create a Docker container with data integration, business analytics, and Carte server components. For example, you can set up Pentaho Data Integration (PDI) containers, then quickly deploy those containers in the cloud to run your data transformation workloads to enhance performance and cost-efficiency of your cloud environment. Supported Pentaho components include the Pentaho Server (Business Analytics and Data Integration), the Carte server, and the Kitchen and Pan command line tools.

Before you begin

Before you start creating and deploying your Docker containers of Pentaho products, be sure you have the following items ready:

  • Installation artifacts (ZIP files) of the Pentaho 9.4 products you are deploying through Docker containers. See Pentaho installation for instructions about downloading installation artifacts of Pentaho products.
  • An installed and stable Docker instance. This instance must have docker-compose installed to support the docker-compose.yml. If you are on a Windows operating system, you must have WSL2 installed and active.
  • A user account with https://hub.docker.com/. The account login must be configured using the docker login command so that Docker can access registered database containers.
  • The curl command line tool must be installed on the host operating system.
  • If you want to download Oracle databases, you must have login credentials to Oracle’s container repository.
  • Pentaho licenses must be installed on the host, and the environment variable PENTAHO_INSTALLED_LICENSE_PATH must point to the installedLicenses.xml file. See Manage Pentaho licenses for more information.
  • Java 8 or 11 must be installed on the host machine.
Audience

IT administrators who know where the data is stored, how to connect to it, details about the computing environment, and how to use the command line to issue commands for Microsoft Windows or Linux.

Login credientials

You must be logged on to an account that has privileges to perform the tasks in these sections. Additionally, Linux users need to use sudo privileges or Docker roles for some tasks.

Docker container deployment process

Use the DockMaker command line tool to generate setup files in the generatedFiles directory to prepare your Pentaho product then use Docker to build an image and compose the container, or use the command to prepare the files, build the image, and compose the related container.

Download and install Docker command line tool

Perform the following steps to download the dock-maker-9.4.0.0-343-dist.zip file and access the DockMaker Tool Tech Doc document.

Procedure

  1. On the Customer Portal home page, sign in using the Pentaho support user name and password provided in your Pentaho Welcome Packet.

  2. Click Downloads, then click Pentaho 9.4 GA Release in the 9.x list.

  3. Navigate to Utilities and Tools DockMaker Tool, then download the dock-maker-9.4.0.0-343-dist.zip file.

  4. Unzip the downloaded ZIP file.

  5. Run the install.sh for Linux or the install.bat for Windows to unpack and then install DockMaker.

Using the DockMaker command line tool

You can use the Pentaho DockMaker command line tool to generate setup files in the generatedFiles directory to prepare Pentaho Server, Carte, Kitchen, and Pan installations for Docker, or use the command to prepare the files, build the image, and compose the related Docker container. Use the -X parameter of the DockMaker command to have the tool build the image and compose the container after it creates the setup files.

NoteThePentaho Data Integration tool (Spoon) is not supported.

The Pentaho version parameter, -V, is required. All others are optional.

The DockMaker command has the following parameters:

ParametersDescription
-A,--additional-plugins <arg>

Specify the acronyms of the plugin products to be installed with the server. More than one plugin can be specified, separated with commas, or leave blank to install no additional plugins. The following plugin types are available:

-D,--database <arg>

Specify the underlaying database for the repository. The following values are supported:

  • postgres/9.6, which is the default if this parameter is omitted.
  • postgres/13.5
  • mysql/8.0
  • oracle/latest/ent
  • oracle/latest/ex
--EULA_ACCEPT <arg>Set to true to accept the end user license agreement (EULA). When set to true, the command runs unattended. If omitted or set to false, you must manually accept the EULA displayed.
-h,--helpPrint the text describing these parameters to the screen.
-I,--install-path <arg>(Optional) Specify the path from root to install Pentaho Server in the image. If omitted, the default value is /opt/Pentaho.
-J,--java-version <version>Set the desired Java version. As a best practice, set java version that is compatible with the image being built.
-K,--kar_file Set this flag to force the Docker build to include the specified .kar file in the Docker image.

Example: ./DockMaker -V <version_number>/<build_number>/ee -A std -U -K cdpdc71 --EULA_ACCEPT=true

-M,--metastoreSpecify the path to a local folder whose contents will be mounted under the /home/pentaho directory on the container. It is intended to hold the Pentaho metastore directories, but can also be used to sync other folders, as needed. This folder is synchronized with the one on the container, so any metastore changes made in the container are replicated on the local filesystem. The folder specified should contain a populated .pentaho and .kettle folder. This folder will be mounted on /home/pentaho in the container. See Metastore volume for details.
-N,--no-cacheSet this flag to force the Docker build to use the –no-cache option.
-P,--patch-version <arg>Specify the version of the Pentaho Server patch to download as a base installation followed by a “/” followed by the distribution build number, followed by another “/”, followed by “ce” or “ee”. For example, 9.4.0.0/627/ee patches to the enterprise edition of the Pentaho Server for build 627 of version 9.4.0.0.
-p,--port <arg>Set the Tomcat port number to use for server communication. If omitted, the default value is port 8081 for the Pentaho Server and 8082 for Carte.
--password <password>Set the admin password for a Pentaho Server build or the Carte user password for a Carte server build.
-T,--product-type

Select the type of product this image represents. The following values are valid:

  • server

    The docker image will contain a fully functional Pentaho Server. It is the default value if the parameter is omitted.

  • pdi

    The docker image will contain a PDI kernel sufficient to run the pan and kitchen commands.

  • carte

    The docker image will contain a fully functional Carte server.

-U,--use-existing-downloadsSet this flag to re-use any existing downloaded artifacts. If omitted, artifacts will always be downloaded. Artifacts are kept in the directory defined by the docker.server.artifactCache property in the DockMaker.properties file. If downloading fails, you can manually put the artifacts needed in this directory and set -U to that directory.
--user <arg>Only used in Carte configurations. Set the username associated with the Carte credentials. If omitted when creating a Carte server, defaults to carte.
-V,--pentaho-version <arg>Specify the version of the Pentaho Server to download as a base installation followed by a “/”, followed by the distribution build number, followed by another “/”, followed by “ee”. For example, 9.4.0.0/343/ee installs the enterprise edition of the Pentaho Server for build 343 of version 9.4.0.0.
NoteThis parameter is required.
-X,--executeSet to build the image and runs docker compose to bring everything up and running after the generatedFiles directory is built. If omitted, generates the generatedFiles directory only. The system displays the docker build and docker compose commands but not execute them.

You can create the generatedFiles directory, build the Docker image, and compose the container for your Pentaho product using the DockMaker command with the -X parameter. You may want to create the generatedFiles directory for the Pentaho product in a Docker container yet use the docker build and docker compose commands later to construct image and compose the container. For example, you may need to modify the files in the generatedFiles directory before using the docker build and docker compose commands, such as when using the Docker command tool with a Kerberos secured cluster.

DockMaker command line tool examples

Use the following scenario samples as examples for working with the Docker command line tool:

  • Prepare (create) the generatedFiles directory for the release build of the Enterprise Edition of Pentaho Server 9.4 with the Pentaho Analyzer, Pentaho Dashboard Designer, and Pentaho Interactive Reports plugins to build as a Docker image and compose the container later, while using existing downloads and automatically accepting the end-user license agreement (EULA):

    DockMaker -V 9.4.0.0/343/ee -A paz,pdd,pir -U –-EULA-ACCEPT=true

  • Prepare, build, and compose the same release as above:

    DockMaker -V 9.4.0.0/343/ee -A paz,pdd,pir -U –-EULA-ACCEPT=true -X

  • Prepare a later patch to the existing Docker image for Pentaho9.4:

    DockMaker -V 9.4.0.0/343/ee -A -U –P 9.4.0.1/400/ee –-EULA-ACCEPT=true

  • Prepare a Pentaho Server with a MySQL 8.0 repository:

    DockMaker -V 9.4.0.0/343/ee -A std -U -D mysql/8.0 –-EULA-ACCEPT=true

  • Prepare a Pentaho Server with the latest version of Oracle for the repository:

    DockMaker -V 9.4.0.0/343/ee -A std -U -D oracle/latest/ent –-EULA-ACCEPT=true

    Log into Oracle for DockMaker to access the Oracle download.

  • Prepare a PDI that will be accessed with pan and kitchen:

    DockMaker -T pdi -V 9.4.0.0/343/ee –-EULA-ACCEPT=true

  • Prepare a Carte server:

    DockMaker -T carte -V 9.4.0.0/343/ee -U –user cluster –password cluster –-EULA-ACCEPT=true

Starting or stopping your Docker container

Whether you use the -X parameter or -T parameter with the DockMaker command or directly use the Docker commands to construct your image and compose a container, how you start or stop your container depends on the Pentaho product it contains.

Starting or stopping a Pentaho Server container

You start the Pentaho Server container when you specify the -X parameter to the DockMaker command. If you do not specify the -X parameter, the generatedFiles folder is created, but you must use the docker build and docker compose commands to generate the docker-compose.yml file. The docker-compose.yml file is a standard docker compose file that starts both thePentaho Server and repository database containers from their associated images. The data in the repository database is stored on a docker volume. You can sign into the server by entering http://localhost:8081/pentaho/Login into a browser. The default port number is 8081. You can change it with the -p parameter to the DockMaker command.

Once the Pentaho Server containers are built, use the docker compose command with the up parameter to start the containers, as shown in the following example:

docker compose -f generatedFiles/docker-compose/yml up

Use docker compose with the stop parameter to stop the containers, as shown in the following example:

docker compose -f generatedFiles/docker-compose/yml stop

Use docker compose with the down parameter to stop and delete the Pentaho Server container yet keep the database volume intact, as shown in the following example:

docker compose -f generatedFiles/docker-compose/yml down

Use docker compose with the down and -v parameters to stop and delete both Pentaho Server and repository database containers with all associated volumes, as shown in the following example:

docker compose -f generatedFiles/docker-compose/yml down -v

You can also the DockMakerDown.bat (Windows) or DockMakerDown.sh (Linux) script files to stop and delete both Pentaho Server and repository database containers with all associated volumes.

Running a PDI container

You can use the PDI container to run transformations with the PDI pan command and jobs with the PDI kitchen command.

Once the PDI container is built, use the docker compose command with the run parameter to run a transformation or job with the PDI pan or kitchen command, as shown in the following example:

docker compose -f generatedFiles/docker-compose/yml run pdi ./pan.sh /file:/opt/pentaho/data-integration/simpleTrans.ktr

In this example, the PDI pan command runs the simpleTrans.ktr transformation located in the pentaho/data-integration directory. The container stops once the transformation or job is completed.

Starting or stopping a Carte server container

You can use a Carte server container the same as you would use a Pentaho Server container, except you can either use the default login credentials and port or specify them through the DockMaker parameters. If you use the default login credentials, the user is carte, the password is carte, and the port number is 8082.

Getting a command prompt on a container

On Linux, if you need a command prompt for a running container, such as a Pentaho Server or Carte container, you can use the docker exec command with the containerID listed via the docker container ls command, as shown in the following example:

docker exec -it containerId bash

To get into a stopped PDI container after a command has been executed, use the docker compose command with the bash argument, as shown in the following example:

docker compose run pdi bash

The pdi reference already is defined in your docker-compose.yaml file.

Using your Docker containers with clusters

To use your Docker containers with Hadoop clusters or cloud storage, you should include a shared volume, such as a metastore volume, that contains all your cluster definitions. For unsecure clusters, you can bring up a PDI or Carte container and use it as is.

Shared volumes

By definition, containers are isolated from the host machine, but you can access your local file system through the Docker concept of shared volumes. You can use three different types of shared volumes with the DockMaker command.

Override Files volume

When you use the DockMaker command, an override volume is generated. This volume contains all the changes that must be made to the basic artifact files to create the configuration desired. The volume binds the generatedFiles/fileOverride directory on the host with the /docker-entrypoint-init directory on the container. When you start the container, the files present in this directory overwrites the files and directories in the /opt/pentaho/data-integration or /opt/pentaho/pentaho-server directory, depending on which one is in use.

If you need any additional files, drivers, or other items placed in the application’s directory, just add them with the proper path to the fileOverride directory.

NoteAny changes you make in this volume will be lost if the command tool is executed again, which is why the command line has the option prepare the container without building it. In many cases, you may need to make more changes to the templates generated by the DockMaker command. If you want to back up the files generated or manually change them, rename the generatedFiles directory or copy the directory to another location. If you take this action, you may need adjust some paths associate with parameters to the DockMaker command.

Metastore volume

You generate this volume by specifying the -M or –metastore parameter when using the DockMaker command. Specifying the parameter helps to create a bi-directional bind on a metastore directory. It binds the directory defined by the parameter with the /home/pentaho directory on the container.

Bind to a copy of your metastore
Because linking directly to your own metastore may break the container contract, you can copy your metastore and bind to that copy. Perform the following steps to bind to a copy of your metastore:

Procedure

  1. Create an arbitrary directory on your host file system to serve as the shared folder.

    For example, you can use d:\metastore on Windows.
  2. Copy the .kettle and .pentaho directories from your home directory to shared folder, the d:/metastore directory for this example.

  3. Copy any other files you need for processing your use case.

    In our example, the d:/metastore directory will then be bound to the /home/pentaho directory for the container.
  4. Use the DockMaker command with -M parameter specified.

    For example, -M d:/metastore.

Database volume

The database volume is only created when you start a Pentaho Server container. This volume contains all the database tables associated with the server repository, quartz scheduler, and log tables. When the database container is started for the first time, the container runs the DDL provided in generatedFiles, which then defines and populates the tables needed.

Use the Docker command tool with a Kerberos secured cluster

Adding a Kerberos secure cluster connection requires additional changes to the generatedFiles directory. You must change the dockerfile file to bring in the additional dependencies and install SSL keys. For this example, the cluster is defined to use a username and password rather than a keytab. Using a keytab requires an additional ADD command to add the keytab file.

Perform the steps to prepare a container to work with a Kerberos secured cluster:

Procedure

  1. Run DockMaker without the -X parameter to prepare the generatedFiles directory but not build the image or compose the container.

  2. Copy the resulting not executed docker build and docker compose commands from the output to another (different) location for later use.

  3. Copy your krb5.conf and cacerts.pem files to the generatedFiles directory.

    Any files copied to the container must be in the generatedFiles context to be available, which is a restriction imposed by Docker.
  4. Edit the generatedFiles/dockerfile file to add the following lines close to the bottom of the file but make sure they appear above the USER ${PENTAHO_USER} line as that root must be defined to execute this additional code:

    RUN apt-get install -y krb5-user
    ADD krb5.conf /etc/krb5.conf
    ADD cacerts.pem /tmp/cacerts.pem
    RUN /usr/bin/keytool -import -noprompt -alias clustername -keystore /etc/ssl/certs/java/cacerts -file /tmp/cacerts.pem -storepass changeit;
    

    where clustername is the name of your cluster.

  5. Run the docker build command you previously copied.

  6. Run the docker compose command you previously copied.

Next steps

You now have a running instance with Kerberos support. You can update your template dockerfile files to make sure these lines are always added to the dockerfile when generatedFiles is first created. The template dockerfiles are in the following locations:

  • Server

    containers\pentaho-server\pentaho-server-auto\Dockerfile

  • PDI

    containers\pentaho-data-integration\pdi-client-auto\Dockerfile

  • Carte

    containers\pentaho-data-integration\pdi-client-auto\Dockerfile