Docker container deployment of Pentaho
This guide walks you through setting up and deploying Pentaho products within complete, cloud-ready Docker containers. You can use your Pentaho Server, based on your licensed Pentaho software, to create a Docker container with data integration, business analytics, and Carte server components. For example, you can set up Pentaho Data Integration (PDI) containers, then quickly deploy those containers in the cloud to run your data transformation workloads to enhance performance and cost-efficiency of your cloud environment. Supported Pentaho components include the Pentaho Server (Business Analytics and Data Integration), the Carte server, and the Kitchen and Pan command line tools.
Before you begin
Before you start creating and deploying your Docker containers of Pentaho products, be sure you have the following items ready:
- Installation artifacts (ZIP files) of the Pentaho 9.5 products you are deploying through Docker containers. See Pentaho installation for instructions about downloading installation artifacts of Pentaho products.
- An installed and stable Docker instance. This instance must have
docker-compose
installed to support the docker-compose.yml. If you are on a Windows operating system, you must have WSL2 installed and active. - A user account with https://hub.docker.com/. The account login must be configured using the docker login command so that Docker can access registered database containers.
- The curl command line tool must be installed on the host operating system.
- If you want to download Oracle databases, you must have login credentials to Oracle’s container repository.
- Pentaho licenses must be installed on the host, and the environment variable
PENTAHO_INSTALLED_LICENSE_PATH
must point to the installedLicenses.xml file. See Manage Pentaho licenses for more information. - Java 8 or 11 must be installed on the host machine.
IT administrators who know where the data is stored, how to connect to it, details about the computing environment, and how to use the command line to issue commands for Microsoft Windows or Linux.
You must be logged on to an account that has privileges to perform the tasks in these sections. Additionally, Linux users need to use sudo privileges or Docker roles for some tasks.
Docker container deployment process
Use the DockMaker
command line tool to generate setup files in the generatedFiles
directory to prepare your Pentaho product then use Docker to build an image and compose the container, or use the command to prepare the files, build the image, and compose the related container.
Download and install Docker command line tool
DockMaker Tool Tech Doc
document.Procedure
On the Support Portal home page, sign in using the Pentaho support user name and password provided in your Pentaho Welcome Packet.
Click Downloads, then click Pentaho 9.5 GA Release in the 9.x list.
Navigate to dock-maker-9.5.0.0-343-dist.zip file.
, then download theUnzip the downloaded ZIP file.
Run the install.sh for Linux or the install.bat for Windows to unpack and then install DockMaker.
Using the DockMaker command line tool
You can use the Pentaho DockMaker command line tool to generate setup files in the generatedFiles
directory to
prepare Pentaho Server, Carte, Kitchen, and Pan installations for Docker, or
use the command to prepare the files, build the image, and compose the related Docker container. Use the -X
parameter of the DockMaker
command to have the
tool build the image and compose the container after it creates the setup files.
The Pentaho version parameter, -V,
is required. All others are optional.
The DockMaker
command has the following parameters:
Parameters | Description |
-A,--additional-plugins <arg> |
Specify the acronyms of the plugin products to be installed with the server. More than one plugin can be specified, separated with commas, or leave blank to install no additional plugins. The following plugin types are available:
|
-D ,--database <arg> |
Specify the underlaying database for the repository. The following values are supported:
|
--EULA_ACCEPT <arg> | Set to true to accept the end user license agreement (EULA). When set to true , the command runs unattended. If omitted or set to
false , you must manually accept the EULA displayed. |
-h,--help | Print the text describing these parameters to the screen. |
-I ,--install-path <arg> | (Optional) Specify the path from root to install Pentaho Server in the image. If omitted, the default
value is /opt/Pentaho . |
-J,--java-version <version> | Set the desired Java version. As a best practice, set java version that is compatible with the image being built. |
-K,--kar_file | Set this flag to force the Docker build to include the specified .kar file in the Docker image. Example: |
-M ,--metastore | Specify the path to a local folder whose contents will be mounted under the /home/pentaho directory on the container. It is intended to hold the
Pentaho metastore directories, but can also be used to sync other folders, as needed. This folder is synchronized with the one
on the container, so any metastore changes made in the container are replicated on the local filesystem. The folder specified should contain a populated .pentaho and
.kettle folder. This folder will be mounted on /home/pentaho in the container. See Metastore volume for details. |
-N ,--no-cache | Set this flag to force the Docker build to use the –no-cache option. |
-P ,--patch-version <arg> | Specify the version of the Pentaho Server patch to download as a base installation followed by a “/”
followed by the distribution build number, followed by another “/”, followed by “ce” or “ee”. For example, 9.4.0.8/627/ee patches to the enterprise edition of the
Pentaho Server for build 627 of version 9.5.0.0. |
-p ,--port <arg> | Set the Tomcat port number to use for server communication. If omitted, the default value is port 8081 for the Pentaho Server and 8082 for Carte. |
--password <password> | Set the admin password for a Pentaho Server build or the Carte user password for a
Carte server build. |
-T ,--product-type |
Select the type of product this image represents. The following values are valid:
|
-U ,--use-existing-downloads | Set this flag to re-use any existing downloaded artifacts. If omitted, artifacts will always be downloaded. Artifacts are kept in the directory defined by the
docker.server.artifactCache property in the DockMaker.properties file. If downloading fails, you can manually put the artifacts needed in this
directory and set -U to that directory. |
--user <arg> | Only used in Carte configurations. Set the username associated with the Carte credentials. If omitted when creating a Carte server, defaults to
carte . |
-V ,--pentaho-version <arg> | Specify the version of the Pentaho Server to download as a base installation followed by a “/”, followed
by the distribution build number, followed by another “/”, followed by “ee”. For example, 9.5.0.0/343/ee installs
the enterprise edition of the Pentaho Server for build 343 of version 9.5.0.0.NoteThis parameter is required. |
-X ,--execute | Set to build the image and runs docker compose to bring everything up and running after the generatedFiles directory is built. If
omitted, generates the generatedFiles directory only. The system displays the docker build and docker compose commands but not
execute them. |
You can create the generatedFiles
directory, build the Docker image, and compose the container for your Pentaho product using the DockMaker
command with the -X
parameter. You may want to create the generatedFiles
directory for the Pentaho product in a Docker container yet use the docker build
and docker compose
commands later to construct image and compose the container. For example, you may need to modify the files in the generatedFiles
directory before using the docker build
and docker compose
commands, such as when using the Docker command tool with a Kerberos secured cluster.
DockMaker command line tool examples
Use the following scenario samples as examples for working with the Docker command line tool:
- Prepare (create) the
generatedFiles
directory for the release build of the Enterprise Edition of Pentaho Server 9.5 with the Pentaho Analyzer, Pentaho Dashboard Designer, and Pentaho Interactive Reports plugins to build as a Docker image and compose the container later, while using existing downloads and automatically accepting the end-user license agreement (EULA):DockMaker -V 9.5.0.0/343/ee -A paz,pdd,pir -U –-EULA-ACCEPT=true
- Prepare, build, and compose the same release as above:
DockMaker -V 9.5.0.0/343/ee -A paz,pdd,pir -U –-EULA-ACCEPT=true -X
- Prepare a later patch to the existing Docker image for Pentaho9.5:
DockMaker -V 9.5.0.0/343/ee -A -U –P 9.4.0.1/400/ee –-EULA-ACCEPT=true
- Prepare a Pentaho Server with a MySQL 8.0 repository:
DockMaker -V 9.5.0.0/343/ee -A std -U -D mysql/8.0 –-EULA-ACCEPT=true
- Prepare a Pentaho Server with the latest version of Oracle for the repository:
DockMaker -V 9.5.0.0/343/ee -A std -U -D oracle/latest/ent –-EULA-ACCEPT=true
Log into Oracle for
DockMaker
to access the Oracle download. - Prepare a PDI that will be accessed with pan and kitchen:
DockMaker -T pdi -V 9.5.0.0/343/ee –-EULA-ACCEPT=true
- Prepare a Carte server:
DockMaker -T carte -V 9.5.0.0/343/ee -U –user cluster –password cluster –-EULA-ACCEPT=true
Command property and registry files
See Docker command tool property and registry files for configuration setting for the DockMaker
command.
Starting or stopping your Docker container
Whether you use the -X
parameter or -T
parameter with the DockMaker
command or directly use the Docker commands to construct your image and compose a container, how you start or stop your container depends on the Pentaho product it contains.
Starting or stopping a Pentaho Server container
You start the Pentaho Server container when you specify the -X
parameter to the DockMaker
command. If you do not specify the -X
parameter, the generatedFiles
folder is created, but you must use the docker build
and docker compose
commands to generate the docker-compose.yml
file. The docker-compose.yml
file is a standard docker compose file that starts both thePentaho Server and repository database containers from their associated images. The data in the repository database is stored on a docker volume. You can sign into the server by entering http://localhost:8081/pentaho/Login
into a browser. The default port number is 8081. You can change it with the -p
parameter to the DockMaker
command.
Once the Pentaho Server containers are built, use the docker compose
command with the up
parameter to start the containers, as shown in the following example:
docker compose -f generatedFiles/docker-compose/yml up
Use docker compose
with the stop
parameter to stop the containers, as shown in the following example:
docker compose -f generatedFiles/docker-compose/yml stop
Use docker compose
with the down
parameter to stop and delete the Pentaho Server container yet keep the database volume intact, as shown in the following example:
docker compose -f generatedFiles/docker-compose/yml down
Use docker compose
with the down
and -v
parameters to stop and delete both Pentaho Server and repository database containers with all associated volumes, as shown in the following example:
docker compose -f generatedFiles/docker-compose/yml down -v
You can also the DockMakerDown.bat
(Windows) or DockMakerDown.sh
(Linux) script files to stop and delete both Pentaho Server and repository database containers with all associated volumes.
Running a PDI container
You can use the PDI container to run transformations with the PDI pan command and jobs with the PDI kitchen command.
Once the PDI container is built, use the docker compose
command with the run
parameter to run a transformation or job with the PDI pan or kitchen command, as shown in the following example:
docker compose -f generatedFiles/docker-compose/yml run pdi ./pan.sh /file:/opt/pentaho/data-integration/simpleTrans.ktr
In this example, the PDI pan command runs the simpleTrans.ktr
transformation located in the pentaho/data-integration
directory. The container stops once the transformation or job is completed.
Starting or stopping a Carte server container
You can use a Carte server container the same as you would use a Pentaho Server container, except you can either use the default login credentials and port or specify them through the DockMaker
parameters. If you use the default login credentials, the user is carte
, the password is carte
, and the port number is 8082
.
Getting a command prompt on a container
On Linux, if you need a command prompt for a running container, such as a Pentaho Server or Carte container, you can use the docker exec
command with the containerID listed via the docker container ls
command, as shown in the following example:
docker exec -it containerId bash
To get into a stopped PDI container after a command has been executed, use the docker compose command with the bash argument, as shown in the following example:
docker compose run pdi bash
The pdi
reference already is defined in your docker-compose.yaml
file.
Using your Docker containers with clusters
To use your Docker containers with Hadoop clusters or cloud storage, you should include a shared volume, such as a metastore volume, that contains all your cluster definitions. For unsecure clusters, you can bring up a PDI or Carte container and use it as is.
Shared volumes
By definition, containers are isolated from the host machine, but you can access your local file system through the Docker concept of shared volumes. You can use three different types of shared volumes with the DockMaker
command.
Override Files volume
When you use the DockMaker
command, an override volume is generated. This volume contains all the changes that must be made to the basic artifact files to create the configuration desired. The volume binds the generatedFiles/fileOverride directory on the host with the /docker-entrypoint-init directory on the container. When you start the container, the files present in this directory overwrites the files and directories in the /opt/pentaho/data-integration or /opt/pentaho/pentaho-server directory, depending on which one is in use.
If you need any additional files, drivers, or other items placed in the application’s directory, just add them with the proper path to the fileOverride directory.
DockMaker
command. If you want to back up the files generated or manually change them, rename the generatedFiles directory or copy the directory to another location. If you take this action, you may need adjust some paths associate with parameters to the DockMaker
command.Metastore volume
You generate this volume by specifying the -M
or –metastore
parameter when using the DockMaker
command. Specifying the parameter helps to create a bi-directional bind on a metastore
directory. It binds the directory defined by the parameter with the /home/pentaho directory on the container.
Bind to a copy of your metastore
Procedure
Create an arbitrary directory on your host file system to serve as the shared folder.
For example, you can use d:\metastore on Windows.Copy the .kettle and .pentaho directories from your home directory to shared folder, the d:/metastore directory for this example.
Copy any other files you need for processing your use case.
In our example, the d:/metastore directory will then be bound to the /home/pentaho directory for the container.Use the
For example,DockMaker
command with-M
parameter specified.-M d:/metastore
.
Database volume
The database volume is only created when you start a Pentaho Server container. This volume contains all the database tables associated with the server repository, quartz scheduler, and log tables. When the database container is started for the first time, the container runs the DDL provided in generatedFiles, which then defines and populates the tables needed.
Use the Docker command tool with a Kerberos secured cluster
Perform the steps to prepare a container to work with a Kerberos secured cluster:
Procedure
Run
DockMaker
without the-X
parameter to prepare the generatedFiles directory but not build the image or compose the container.Copy the resulting not executed
docker build
anddocker compose
commands from the output to another (different) location for later use.Copy your krb5.conf and cacerts.pem files to the generatedFiles directory.
Any files copied to the container must be in the generatedFiles context to be available, which is a restriction imposed by Docker.Edit the generatedFiles/dockerfile file to add the following lines close to the bottom of the file but make sure they appear above the
USER ${PENTAHO_USER}
line as that root must be defined to execute this additional code:RUN apt-get install -y krb5-user ADD krb5.conf /etc/krb5.conf ADD cacerts.pem /tmp/cacerts.pem RUN /usr/bin/keytool -import -noprompt -alias clustername -keystore /etc/ssl/certs/java/cacerts -file /tmp/cacerts.pem -storepass changeit;
where clustername is the name of your cluster.
Run the
docker build
command you previously copied.Run the
docker compose
command you previously copied.
Next steps
You now have a running instance with Kerberos support. You can update your template dockerfile files to make sure these lines are always added to the dockerfile when generatedFiles is first created. The template dockerfiles
are in the following locations:
Server
containers\pentaho-server\pentaho-server-auto\Dockerfile
PDI
containers\pentaho-data-integration\pdi-client-auto\Dockerfile
Carte
containers\pentaho-data-integration\pdi-client-auto\Dockerfile
Using DockMaker with service packs
The DockMaker command line tool does not support the service pack (SP) update installers. You must use the full SP artifacts. To use these artifacts, first download them directly from the DockMaker Full Artifacts directory within each service pack release article. Then, copy them to the artifactCache directory or to another directory as defined by the docker.server.artifactCache property in the Dockmaker.properties file. The following shows example artifact file downloads that are available for the 9.3 release:
Server:
- pentaho-server-ee-9.3.0.4-733-dist.zip
- paz-plugin-ee-9.3.0.4-733-dist if paz is in the command line
- pdd-plugin-ee-9.3.0.4-733-dist if pdd is in the command line
- pir-plugin-ee-9.3.0.4-733-dist if pir is in the command line
- Any .kar files that you have specified in the command line, for example, pentaho-hadoop-shims-ee-cdpdc71-kar-9.3.0.4-733-dist
PDI client, Carte:
- pdi-ee-client-9.3.0.4-733-dist
- Hadoop add-ons for Pentaho 9.5 or greater
- Any .kar files specified in the command line, for example, pentaho-hadoop-shims-ee-cdpdc71-kar-9.3.0.4-733-dist