Skip to main content

Pentaho+ documentation is moving!

The new product documentation portal is here. Check it out now at


Hitachi Vantara Lumada and Pentaho Documentation

What's new in Pentaho 9.1

The Pentaho 9.1 Enterprise Edition delivers a variety of features and enhancements, including access to Google DataProc and Lumada Data Catalog in PDI, along with the Pentaho Upgrade Installer. Pentaho 9.1 also continues to enhance the Pentaho platform experience by introducing new features and improvements.

Access to Google Dataproc from PDI

You can now access and process data from a Google Dataproc cluster in PDI. Google Dataproc is a cloud-native Spark and Hadoop managed service that has built-in integration with other Google Cloud Platform services, such as BigQuery and Cloud Storage. With PDI and Google Dataproc, you can migrate from on-premise to the Google Cloud.

You can use PDI's Google Dataproc driver and named connection feature to access data on your Google Dataproc cluster as you would other Hadoop clusters, like Cloudera and Amazon EMR. See Set up the Pentaho Server to connect to a Hadoop cluster for further instructions.

Catalog steps in PDI

Lumada Data Catalog lets data engineers, data scientists, and business users accelerate metadata discovery and data categorization, and permits data stewards to manage sensitive data. Data Catalog collects metadata for various types of data assets and points to the asset's location in storage. Data assets registered in Data Catalog are known as data resources.

You can use the following four new PDI steps to work with Data Catalog metadata and data resources within your PDI transformations:

  • Read Metadata

    Search Data Catalog’s existing metadata for specific data resources, including their storage location.

  • Write Metadata

    Revise the existing Data Catalog tags associated with an existing data resource.

  • Catalog Input

    Reads the CSV text file types or Parquet data formats of a Data Catalog data resource that is stored in a Hadoop or S3 ecosystem and outputs the data payload in the form of rows to use in a transformation.

  • Catalog Output

    Encodes CSV text file types or Parquet data formats using the schema defined in PDI to create a new data resource or to replace or update an existing data resource in Data Catalog.

Pentaho Upgrade Installer

The new Pentaho Upgrade Installer is an easy-to-use interface tool that automatically applies the new release version to your Pentaho products. You can upgrade version 9.0 of your Pentaho products on a server or a workstation directly to version 9.1 using this simplified upgrade process via the Pentaho Upgrade Installer. For instruction on the new upgrade process, see Pentaho upgrade.

S3 enhancements including MinIO support

In Pentaho 9.1, PDI now features a simplified connection path and permissions entries for Amazon S3, while continuing to support your existing S3 transformations and jobs. In addition, MinIO has been added for users that need to connect to S3 storage outside of the Amazon S3 environment. For more information, see Connecting to Virtual File Systems.

Minor platform enhancements
Minor Business Analytics enhancements

Pentaho 9.1 includes the following minor Business Analytics improvements: