Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Running PDI-CLI on GCP

You can use PDI-CLI images to run the Kitchen command to run transformations and the Pan command to run jobs on GCP.

Prerequisites for installing PDI-CLI on GCP

The following software must be installed on your workstation before installing PDI-CLI:

  • The PDI is needed to connect to the Carte Server for testing.
  • A stable version of Docker must be installed on your workstation. See Docker documentation.

Process overview for running PDI-CLI on GCP

Use the following steps to deploy PDI-CLI on the GCP cloud platform:

  1. Download and extract PDI-CLI for GCP.
  2. Create a Docker registry in GCP.
  3. Push the PDI-CLI Docker image to the GCP registry.
  4. Create and populate a Google Cloud Storage bucket.
  5. Submit a GCP batch job to run a transformation.

Download and extract Pentaho for GCP

Download and open the package files that contain the files needed to install Pentaho.

Procedure

  1. Navigate to the Support Portal and download the GCP version of the Docker image with the corresponding license file for the applications you want to install on your workstation.

    NoteMake note of the image name for later.
  2. Extract the image into your local Docker registry.

    The image package file (<package-name>.tar.gz) contains the following:
    NameContent description
    imageDirectory containing all the Pentaho source images.
    yamlDirectory containing YAML configuration files and various utility files.
    README.mdFile containing a link to detailed information about what we are providing for this release.

Create a Docker registry in GCP

Before pushing the Pentaho image to GCP, you need to create a Docker registry in GCP.

Procedure

  1. Create a Docker registry in GCP.

    For instructions, see Store Docker container images in Artifact Registry.
  2. Connect to the Docker registry using the following command:

    gcloud auth configure-docker <YOUR_REGION>-docker.pkg.dev
  3. To verify that the registry has been added correctly, run this command:

    cat ~/.docker/config.json
  4. Record the name of the registry that you have created in the Worksheet for GCP hyperscaler.

Load and push the Pentaho Docker image to the GCP registry

Perform the following steps to load and push the Pentaho Docker image to GCP:

Procedure

  1. Navigate to the image directory containing the Pentaho tar.gz files.

  2. Select and load the tar.gz file into the local registry by running the following command:

    docker load -i <pentaho-image>.tar.gz
  3. Record the name of the source image that was loaded into the registry by using the following command:

    docker images
  4. Tag the source image so it can be pushed to the cloud platform by using the following command:

    docker tag <source-image>:<tag> <target-repository>:<tag>
  5. Push the image to the GCP registry using the following command:

    docker push <IMAGE_NAME>
  6. Verfiy that the image has been properly loaded using the Google Cloud Console.

Create a Google Cloud Storage bucket

Create a Google Cloud Storage bucket and place your configuration files in any directory path in the bucket.

In these instructions, the following path is used as an example: gs://pentaho-project/my-bucket.

Perform the following steps to create and populate a Google Cloud Storage bucket:

Procedure

  1. Create a Cloud Storage bucket as explained in the GCP documentation.

  2. Add the Kettle transformation (KTR) and job (KJB) files that you want to use to the bucket.

  3. If any of your jobs or transformations use VFS connections to the Google Storage buckets, perform the following steps:

    1. Upload a copy of your GCS credentials file to the Google Storage bucket.

      For example gs://pentaho-project/my-bucket/<credentials-file>.json
    2. Update any VFS connections that use this credentials file to point to the following path: /home/pentaho/data-integration/data/<credentials-file>.json

  4. Copy your local .pentaho/metastore folder to the Google Storage bucket.

    The .pentaho/ folder is located in the user home directory by default.
    NoteYou must edit your GCS VFS connections before copying the .pentaho/ folder. If you need to change the VFS connections, upload the GCS credentials file and update any associated GFS connections again.
  5. Copy any license files (*.lic) needed for the product(s) you will be using to the location specified by PROJECT_GCP_LOCATION.

Submit a GCP batch job to run a transformation

Procedure

  1. Navigate to Batch for Google Cloud at job creation and execution overview.

    See instructions for running a GCP batch job at Job creation and execution overview.
  2. Set the following environment variables in the Docker container:

    VariableDescription
    METASTORE_LOCATION

    gs://pentaho-project/my-bucket/

    PROJECT_GCP_LOCATION

    gs://pentaho-project/my-bucket/

    PROJECT_STARTUP_JOB

    my-transformation.ktr

    Must be uploaded to your bucket. See Create a Google Cloud Storage bucket.

    CREDENTIAL_FILE

    <credentials-file>.json

    This is the credentials file described in Create a Google Cloud Storage bucket.

    PARAMETERS (optional)Any additional parameters to be passed to Kitchen or Pan.

Worksheet for GCP hyperscaler

Use the following worksheet for important information needed during installation and configuration of Pentaho.

VariableRecord your setting
GCP_REGISTRY_URI
DATABASE_HOSTNAME
DATABASE_PORT
STORAGE_BUCKET_NAME
GKE_CLUSTER_NAME
GKE_NODE_IP
GKE_NODE_PORT