Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

New Features in Pentaho Data Integration 5.2


Learn about R Step Improvements, New DI Server administration features, enhanced HDP and Cloudera Kerberos support, Upgrade Utility changes.

Pentaho Data Integration 5.2 delivers many exciting and powerful features that help you quickly and securely access, blend, transform, and explore data.

New Streamlined Data Refinery Feature

The Streamlined Data Refinery (SDR)  is a simplified, ad hoc ETL refinery composed of a series of PDI jobs that take raw data, augment and blend it through the request form, and then publish it to the BA Server for report designers to use in Analyzer.

Related Content:

R Script Executor Step Improvements

The R Script Executor, Weka Forecasting, and Weka Scoring steps form the core of the Data Science Pack and transforms PDI into a powerful, predictive analytics tool. The R Script Executor step allows you to incorporate R scripts in your transformation so that you can include R-based statistical programming in your data flow. In PDI Version 5.2 you can now "plug and play" R scripts, without extra customization.  Now you can pass incoming field metadata to the output field metadata, use a more intuitive user interface to run scripts by rows or by batches, and test scripts.   

Related Content:

New DI Server Administration Features

Porting content from one environment to another and performing general DI Repository maintenance is easier with the introduction of the new Purge Utility.  The Purge Utility permanently purges the repository of versions of shared objects, such as database connection information, jobs, and transformations.  You can also turn DI Repository versioning and comment capturing capabilities on and off.   

Related Content:

Kerberos Security Support for CDH 5.1 and HDP 2.1

If you are already using Kerberos to authenticate access to a Cloudera Distributed Hadoop 5.1 or Hortonworks Data Platform 2.1 cluster, with a little extra configuration, you can also use Kerberos to authenticate Pentaho DI users who need to access those clusters.

Related Content:

New Marketplace Plugins

Pentaho Marketplace continues to grow with many more of your contributions. Pentaho Marketplace is a home for community-developed plugins and a place where you can contribute, learn, benefit from, and connect to others. New contributions include:

  • LookupTimeDimensionStep: Looks up and creates an entry on a data warehouse dimension time table and returns the ID.
  • Probabilistic Row Distributions: Contains a collection of Row Distribution plugins for PDI that use probabilistic methods for determining the distribution of rows.
  • PDI Groovy Console: Adds a Groovy console to the Help menu that has helper methods and classes that interact with the PDI environment.
  • Gremlin Script Step: Provides a Gremlin script step for graph pipeline processing.

Related Content:

Improved Upgrade Experience

Upgrading PDI is easier because it is no longer a manual process.  You can now upgrade from 5.1.x to 5.2 using the same upgrade utility used for patch releases. 

Related Content:

New Documentation

There is now only one upgrade guide instead of two. 

Related Content:

Minor Functionality Changes

To learn more about minor functionality changes that might impact your upgrade or migration experience, see the PDI 5.1 to 5.2 Functionality Changes article.