Installing the Platform or PDI Server on GCP
These instructions provide the steps necessary to deploy Docker images of the Platform or PDI Server on GCP.
Prerequisites for installing the Platform or PDI Server on GCP
The following software must be installed on your workstation before installing the Platform or PDI Server:
- The PDI is needed to connect to the Carte Server for testing.
- A stable version of Docker must be installed on your workstation. See Docker documentation.
- The Kubernetes command-line tool, Kubectl must be installed.
- (Optional) Use Kubernetes or Lens to manage your Kubernetes cluster.
- (Optional) Use the Kubectl bash-completion package.
- GCloud CLI utils must be installed and authenticated.
- The following software versions are required:
Application Supported version Python v3.x
Process overview for installing the Platform or PDI Server on GCP
Use the following steps to deploy the Platform Server or PDI Server on the GCP cloud platform:
- Download and extract Pentaho for GCP.
- Create a Docker registry in GCP.
- Push the Pentaho Docker image to GCP.
- Create and populate a Google Cloud Storage bucket.
- Create a Google Cloud SQL PostgreSQL instance.
- Set up a GKE cluster on Google Cloud.
- Deploy the Platform or PDI Server on GCP.
Download and extract Platform or PDI Server for GCP
Download and open the package files that contain the files needed to install Pentaho.
Procedure
Navigate to the Support Portal and download the GCP version of the Docker image with the corresponding license file for the applications you want to install on your workstation.
NoteMake note of the image name for later.Extract the image into your local Docker registry.
The image package file (<package-name>.tar.gz) contains the following:Name Content description image Directory containing all the Pentaho source images. sql-scripts Directory containing SQL scripts for various operations. yaml Directory containing YAML configuration files and various utility files. README.md File containing a link to detailed information about what we are providing for this release.
Create a Docker registry in GCP
Before pushing the Pentaho image to GCP, you need to create a Docker registry in GCP.
Procedure
Create a Docker registry in GCP.
For instructions, see Store Docker container images in Artifact Registry.Connect to the Docker registry using the following command:
gcloud auth configure-docker <YOUR_REGION>-docker.pkg.dev
To verify that the registry has been added correctly, run this command:
cat ~/.docker/config.json
Record the name of the registry that you have created in the Worksheet for GCP hyperscaler.
Load and push the Pentaho Docker image to the GCP registry
Perform the following steps to load and push the Pentaho Docker image to GCP:
Procedure
Navigate to the image directory containing the Pentaho tar.gz files.
Select and load the tar.gz file into the local registry by running the following command:
docker load -i <pentaho-image>.tar.gz
Record the name of the source image that was loaded into the registry by using the following command:
docker images
Tag the source image so it can be pushed to the cloud platform by using the following command:
docker tag <source-image>:<tag> <target-repository>:<tag>
Push the image to the GCP registry using the following command:
docker push <IMAGE_NAME>
Verfiy that the image has been properly loaded using the Google Cloud Console.
Create a Google Cloud Storage bucket
Create a Google Cloud Storage bucket and place your configuration files in any directory path in the bucket.
In these instructions, the following path is used as an example: gs://pentaho-project/my-bucket
.
Perform the following steps to create and populate a Google Cloud Storage bucket:
Procedure
Create a Cloud Storage bucket as explained in the GCP documentation.
Add the Kettle transformation (
KTR
) and job (KJB
) files that you want to use to the bucket.If any of your jobs or transformations use VFS connections to the Google Storage buckets, perform the following steps:
Upload a copy of your GCS credentials file to the Google Storage bucket.
For example gs://pentaho-project/my-bucket/<credentials-file>.jsonUpdate any VFS connections that use this credentials file to point to the following path: /home/pentaho/data-integration/data/<credentials-file>.json
Copy your local .pentaho/metastore folder to the Google Storage bucket.
The .pentaho/ folder is located in the user home directory by default.NoteYou must edit your GCS VFS connections before copying the .pentaho/ folder. If you need to change the VFS connections, upload the GCS credentials file and update any associated GFS connections again.Copy any license files (*.lic) needed for the product(s) you will be using to the location specified by PROJECT_GCP_LOCATION.
Upload files into the Google Cloud Storage bucket
After the Google Cloud Storage bucket is created, manually create any needed directories.
Procedure
Navigate to the Google Cloud console.
Use the console to create directories and upload the relevant files to those directories as explained in the following tables:
Directory actions:
Directory Actions /root All the files in the Cloud Storage bucket are copied to the Pentaho Server's /home/pentaho/.kettle directory.
If you need to copy a file to the Pentaho Server's /home/pentaho/.kettle directory, drop the file in the root directory of the Cloud Storage bucket.
licenses The licenses directory contains the Pentaho licenses files. However, the Server Secret Generation Tool documented in Deploy the Platform or PDI Server on GCP automatically retrieves the needed license file from the proper location, as long as you download the license file with the image distribution as described in Download and extract Platform or PDI Server for GCP.
Without the license file, the server will ask for the license file the first time you connect to Pentaho. You can provide the file, but it will not be persisted, and the server will ask for it every time you reboot.
This file is located in the
local.pentaho
directory.custom-lib If Pentaho needs custom JAR libraries, add the
custom-lib
directory to your Cloud Storage bucket and place the libraries there.Files in this directory are copied to Pentaho’s
lib
directory.Jdbc-drivers If your Pentaho installation needs JDBC drivers, add the
jdbc-drivers
directory to your Cloud Storage bucket and place the drivers in this directory.Files in this directory are copied to Pentaho’s
lib
directory.plugins If your Pentaho installation needs additional plugins installed, add the plugins directory to your Cloud Storage bucket.
Files in this directory are copied to Pentaho’s
plugins
directory. The plugins should be organized in their own directories as expected by Pentaho.drivers If your Pentaho installation needs big data drivers installed, add the drivers directory to your Cloud Storage bucket and place the big data drivers in this directory.
Files placed in this directory are copied to Pentaho’s
drivers
directory.metastore Pentaho can execute jobs and transformations. Some of these require additional information that is usually stored in the Pentaho metastore.
If you need to provide your Pentaho metastore to Pentaho, copy your local
metastore
directory to the root of the Cloud Storage bucket. From there, themetastore
directory is copied to the proper location within the Docker image.server-structured-override The
server-structured-override
directory is the last resort if you want to make changes to any other files in the image at runtime.All files and directories within the
server-structured-override
directory are copied to thepentaho-server
directory as they are in theserver-structured-override
directory.If the same files exist in the
pentaho-server
directory, they will be overwritten.File actions:
File Actions context.xml The Pentaho configuration YAML file included with the image in the
templates
project directory is used to install this product. You must set the RDS host and RDS port parameters when you install Pentaho. Upon installation, the parameters in the configuration YAML are used to generate a customcontext.xml
file for the Pentaho installation so it can connect to the database-specific repository.If these are the only changes required in your
context.xml
, you don’t need to provide acontext.xml
in your Cloud Storage bucket. If you need to configure additional parameters in yourcontext.xml
, you must provide thecustom.xml
file in your Cloud Storage bucket.content-config.properties The
content-config.properties
file is used by the Pentaho Docker image to provide instructions on, which Cloud Storage files to copy over and their location.The instructions are populated as multiple lines in the following format:
${KETTLE_HOME_DIR}/<some-dir-or-file>=${SERVER_DIR}/<some-dir>
A template for this file can be found in the templates project directory.
The template has an entry where the file
context.xml
is copied to the required location within the Docker image:${KETTLE_HOME_DIR}/context.xml=${SERVER_DIR}/tomcat/webapps/pentaho/META-INF/context.xml
content-config.sh Use this bash script to configure files, change file and directory ownership, move files around, install missing apps, and so on.
You can add it to the Cloud Storage bucket.
It is executed in the Docker image after the other files are processed.
metastore.zip Pentaho can execute jobs and transformations. Some of these require additional information that is usually stored in the Pentaho metastore.
If you need to provide your Pentaho metastore to Pentaho, zip the content of your
local.pentaho
directory with the namemetastore.zip
and add it to the root of the Cloud Storage bucket. Themetastore.zip
file is extracted to the proper location within the Docker image.NoteThe VFS connections cannot be copied to the hyperscaler server from PDI the same way as the named connections can be copied. You must connect to Pentaho on the hyperscaler and create the new VFS connection.Run the scripts in the sql-scripts folder in the distribution in the numbered order.
Create a Google Cloud SQL PostgreSQL instance
Create a Google Cloud SQL PostgreSQL instance and do the necessary configurations.
Procedure
Go to Connect to Cloud SQL for PostgreSQL from Cloud Shell and follow the instructions to create the PostgreSQL instance.
Run the following configuration scripts in order while connected:
- create_quartz_postgresql.sql
- create_repository_postgresql.sql
- create_jcr_postgresql.sql
- pentaho_mart_postgresql.sql
- pentaho_logging_postgresql.sql
Set up a GKE cluster on GCP
Use the following steps to configure a GKE cluster on Google Cloud Platform (GCP).
Procedure
Create a GKE cluster on GCP.
Configure the cluster to meet your requirements. For a simple example, see Create a GKE cluster.Authenticate with the cluster using the following command:
gcloud container clusters get-credentials <CLUSTER_NAME> --region=<YOUR_REGION>
Check the connection using the following command:
(Optional) Use a tool like Kubernetes or Lens to check that they are connected.
Deploy the Platform or PDI Server on GCP
Prepare to launch your service by performing some setup steps and then deploy.
Procedure
Add the Pentaho license by running one of the following scripts in the distribution.
- Windows: Run
start.bat
- Linux: Run
start.sh
- Windows: Run
Complete the configuration page of the Server Secret Generation Tool by adding the license files and using the values you recorded in the Worksheet for GCP hyperscaler.
Click Generate Yaml.
Verify that the service is running by executing the following command:
kubectl get pods --namespace pentaho-server
After a few moments, you should see a pod come up (It can take a minute to pull the image the first time you start.)Connect to the Platform or PDI Server from the Spoon client or visit the exposed load balancer IP address to connect to the Pentaho User Console.
The default port number is 8080 but can be different if the networking setup requires it.
Worksheet for GCP hyperscaler
To access the common worksheet for the GCP hyperscaler, go to Worksheet for GCP hyperscaler.