Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Installing the Platform or PDI Server on GCP

These instructions provide the steps necessary to deploy Docker images of the Platform or PDI Server on GCP.

Prerequisites for installing the Platform or PDI Server on GCP

The following software must be installed on your workstation before installing the Platform or PDI Server:

  • The PDI is needed to connect to the Carte Server for testing.
  • A stable version of Docker must be installed on your workstation. See Docker documentation.
  • The Kubernetes command-line tool, Kubectl must be installed.
  • (Optional) Use Kubernetes or Lens to manage your Kubernetes cluster.
  • (Optional) Use the Kubectl bash-completion package.
  • GCloud CLI utils must be installed and authenticated.
  • The following software versions are required:
    ApplicationSupported version
    Pythonv3.x

Process overview for installing the Platform or PDI Server on GCP

Use the following steps to deploy the Platform Server or PDI Server on the GCP cloud platform:

  1. Download and extract Pentaho for GCP.
  2. Create a Docker registry in GCP.
  3. Push the Pentaho Docker image to GCP.
  4. Create and populate a Google Cloud Storage bucket.
  5. Create a Google Cloud SQL PostgreSQL instance.
  6. Set up a GKE cluster on Google Cloud.
  7. Deploy the Platform or PDI Server on GCP.

Download and extract Platform or PDI Server for GCP

Download and open the package files that contain the files needed to install Pentaho.

Procedure

  1. Navigate to the Support Portal and download the GCP version of the Docker image with the corresponding license file for the applications you want to install on your workstation.

    NoteMake note of the image name for later.
  2. Extract the image into your local Docker registry.

    The image package file (<package-name>.tar.gz) contains the following:
    NameContent description
    imageDirectory containing all the Pentaho source images.
    sql-scriptsDirectory containing SQL scripts for various operations.
    yamlDirectory containing YAML configuration files and various utility files.
    README.mdFile containing a link to detailed information about what we are providing for this release.

Create a Docker registry in GCP

Before pushing the Pentaho image to GCP, you need to create a Docker registry in GCP.

Procedure

  1. Create a Docker registry in GCP.

    For instructions, see Store Docker container images in Artifact Registry.
  2. Connect to the Docker registry using the following command:

    gcloud auth configure-docker <YOUR_REGION>-docker.pkg.dev
  3. To verify that the registry has been added correctly, run this command:

    cat ~/.docker/config.json
  4. Record the name of the registry that you have created in the Worksheet for GCP hyperscaler.

Load and push the Pentaho Docker image to the GCP registry

Perform the following steps to load and push the Pentaho Docker image to GCP:

Procedure

  1. Navigate to the image directory containing the Pentaho tar.gz files.

  2. Select and load the tar.gz file into the local registry by running the following command:

    docker load -i <pentaho-image>.tar.gz
  3. Record the name of the source image that was loaded into the registry by using the following command:

    docker images
  4. Tag the source image so it can be pushed to the cloud platform by using the following command:

    docker tag <source-image>:<tag> <target-repository>:<tag>
  5. Push the image to the GCP registry using the following command:

    docker push <IMAGE_NAME>
  6. Verfiy that the image has been properly loaded using the Google Cloud Console.

Create a Google Cloud Storage bucket

Create a Google Cloud Storage bucket and place your configuration files in any directory path in the bucket.

In these instructions, the following path is used as an example: gs://pentaho-project/my-bucket.

Perform the following steps to create and populate a Google Cloud Storage bucket:

Procedure

  1. Create a Cloud Storage bucket as explained in the GCP documentation.

  2. Add the Kettle transformation (KTR) and job (KJB) files that you want to use to the bucket.

  3. If any of your jobs or transformations use VFS connections to the Google Storage buckets, perform the following steps:

    1. Upload a copy of your GCS credentials file to the Google Storage bucket.

      For example gs://pentaho-project/my-bucket/<credentials-file>.json
    2. Update any VFS connections that use this credentials file to point to the following path: /home/pentaho/data-integration/data/<credentials-file>.json

  4. Copy your local .pentaho/metastore folder to the Google Storage bucket.

    The .pentaho/ folder is located in the user home directory by default.
    NoteYou must edit your GCS VFS connections before copying the .pentaho/ folder. If you need to change the VFS connections, upload the GCS credentials file and update any associated GFS connections again.
  5. Copy any license files (*.lic) needed for the product(s) you will be using to the location specified by PROJECT_GCP_LOCATION.

Upload files into the Google Cloud Storage bucket

After the Google Cloud Storage bucket is created, manually create any needed directories.

Procedure

  1. Navigate to the Google Cloud console.

  2. Use the console to create directories and upload the relevant files to those directories as explained in the following tables:

    Directory actions:

    DirectoryActions
    /root

    All the files in the Cloud Storage bucket are copied to the Pentaho Server's /home/pentaho/.kettle directory.

    If you need to copy a file to the Pentaho Server's /home/pentaho/.kettle directory, drop the file in the root directory of the Cloud Storage bucket.

    licenses

    The licenses directory contains the Pentaho licenses files. However, the Server Secret Generation Tool documented in Deploy the Platform or PDI Server on GCP automatically retrieves the needed license file from the proper location, as long as you download the license file with the image distribution as described in Download and extract Platform or PDI Server for GCP.

    Without the license file, the server will ask for the license file the first time you connect to Pentaho. You can provide the file, but it will not be persisted, and the server will ask for it every time you reboot.

    This file is located in the local.pentaho directory.

    custom-lib

    If Pentaho needs custom JAR libraries, add the custom-lib directory to your Cloud Storage bucket and place the libraries there.

    Files in this directory are copied to Pentaho’s lib directory.

    Jdbc-drivers

    If your Pentaho installation needs JDBC drivers, add the jdbc-drivers directory to your Cloud Storage bucket and place the drivers in this directory.

    Files in this directory are copied to Pentaho’s lib directory.

    plugins

    If your Pentaho installation needs additional plugins installed, add the plugins directory to your Cloud Storage bucket.

    Files in this directory are copied to Pentaho’s plugins directory. The plugins should be organized in their own directories as expected by Pentaho.

    drivers

    If your Pentaho installation needs big data drivers installed, add the drivers directory to your Cloud Storage bucket and place the big data drivers in this directory.

    Files placed in this directory are copied to Pentaho’s drivers directory.

    metastore

    Pentaho can execute jobs and transformations. Some of these require additional information that is usually stored in the Pentaho metastore.

    If you need to provide your Pentaho metastore to Pentaho, copy your local metastore directory to the root of the Cloud Storage bucket. From there, the metastore directory is copied to the proper location within the Docker image.

    server-structured-override

    The server-structured-override directory is the last resort if you want to make changes to any other files in the image at runtime.

    All files and directories within the server-structured-override directory are copied to the pentaho-server directory as they are in the server-structured-override directory.

    If the same files exist in the pentaho-server directory, they will be overwritten.

    File actions:

    FileActions
    context.xml

    The Pentaho configuration YAML file included with the image in the templates project directory is used to install this product. You must set the RDS host and RDS port parameters when you install Pentaho. Upon installation, the parameters in the configuration YAML are used to generate a custom context.xml file for the Pentaho installation so it can connect to the database-specific repository.

    If these are the only changes required in your context.xml, you don’t need to provide a context.xml in your Cloud Storage bucket. If you need to configure additional parameters in your context.xml, you must provide the custom.xml file in your Cloud Storage bucket.

    content-config.properties

    The content-config.properties file is used by the Pentaho Docker image to provide instructions on, which Cloud Storage files to copy over and their location.

    The instructions are populated as multiple lines in the following format:

    ${KETTLE_HOME_DIR}/<some-dir-or-file>=${SERVER_DIR}/<some-dir>

    A template for this file can be found in the templates project directory.

    The template has an entry where the file context.xml is copied to the required location within the Docker image:

    ${KETTLE_HOME_DIR}/context.xml=${SERVER_DIR}/tomcat/webapps/pentaho/META-INF/context.xml

    content-config.sh

    Use this bash script to configure files, change file and directory ownership, move files around, install missing apps, and so on.

    You can add it to the Cloud Storage bucket.

    It is executed in the Docker image after the other files are processed.

    metastore.zip

    Pentaho can execute jobs and transformations. Some of these require additional information that is usually stored in the Pentaho metastore.

    If you need to provide your Pentaho metastore to Pentaho, zip the content of your local.pentaho directory with the name metastore.zip and add it to the root of the Cloud Storage bucket. The metastore.zip file is extracted to the proper location within the Docker image.

    NoteThe VFS connections cannot be copied to the hyperscaler server from PDI the same way as the named connections can be copied. You must connect to Pentaho on the hyperscaler and create the new VFS connection.
  3. Run the scripts in the sql-scripts folder in the distribution in the numbered order.

Create a Google Cloud SQL PostgreSQL instance

Create a Google Cloud SQL PostgreSQL instance and do the necessary configurations.

See Connect to Cloud SQL for PostgreSQL from Cloud Shell.

Procedure

  1. Go to Connect to Cloud SQL for PostgreSQL from Cloud Shell and follow the instructions to create the PostgreSQL instance.

  2. Run the following configuration scripts in order while connected:

    1. create_quartz_postgresql.sql
    2. create_repository_postgresql.sql
    3. create_jcr_postgresql.sql
    4. pentaho_mart_postgresql.sql
    5. pentaho_logging_postgresql.sql

Set up a GKE cluster on GCP

Use the following steps to configure a GKE cluster on Google Cloud Platform (GCP).

Procedure

  1. Create a GKE cluster on GCP.

    Configure the cluster to meet your requirements. For a simple example, see Create a GKE cluster.
  2. Authenticate with the cluster using the following command:

    gcloud container clusters get-credentials <CLUSTER_NAME> --region=<YOUR_REGION>
  3. Check the connection using the following command:

  4. (Optional) Use a tool like Kubernetes or Lens to check that they are connected.

Deploy the Platform or PDI Server on GCP

Prepare to launch your service by performing some setup steps and then deploy.

Procedure

  1. Add the Pentaho license by running one of the following scripts in the distribution.

    • Windows: Run start.bat
    • Linux: Run start.sh
    This opens the Server Secret Generation Tool.
  2. Complete the configuration page of the Server Secret Generation Tool by adding the license files and using the values you recorded in the Worksheet for GCP hyperscaler.

  3. Click Generate Yaml.

  4. Verify that the service is running by executing the following command:

    kubectl get pods --namespace pentaho-server
    After a few moments, you should see a pod come up (It can take a minute to pull the image the first time you start.)
  5. Connect to the Platform or PDI Server from the Spoon client or visit the exposed load balancer IP address to connect to the Pentaho User Console.

    The default port number is 8080 but can be different if the networking setup requires it.

Worksheet for GCP hyperscaler

To access the common worksheet for the GCP hyperscaler, go to Worksheet for GCP hyperscaler.