Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

New Features in Pentaho Data Integration 5.1


Learn about the Data Science Pack, YARN support, security enhancements, new steps, JBoss support, and new Marketplace plugins.

Pentaho Data Integration 5.1 delivers many exciting and powerful features that help you quickly and securely access, blend, transform, and explore data.

Data Science Pack with Weka and R

The R Script Executor, Weka Forecasting, and Weka Scoring steps form the core of the Data Science Pack and transforms PDI into a powerful, predictive analytics tool. The R Script Executor step, which is new for 5.1, lets you include R scripts in your transformations and jobs. You can customize random seed sampling, limit the batch and reservoir size, adjust logging level messages, and more. You can also choose to load the script from a file at runtime, enabling you to have more flexibility in transformation design.

Related Content:

YARN Hadoop Distribution Support

PDI includes support for YARN capabilities including enhanced scalability, compatibility with MapReduce, improved cluster use, and greater support for agile development processes.  YARN also provides support for non-MapReduce workloads.

Cloudera and MapR Hadoop Distribution Support

Use Pentaho's innovative Big Data Adaptive Layer to connect to more Hadoop Distributions, including Cloudera 5 and MapR 3.1. These certified and tested YARN-based distributions allow you to use PDI to build scalable solutions that are optimized for performance. Pentaho supports over 20 different Hadoop Distribution versions from vendors such as Apache, Intel, and MapR.  Pentaho also supports Cloudera distributions and is a certified Cloudera partner.

Related Content:

YARN for Carte Kettle Clusters

The Start a YARN Kettle Cluster and Stop a YARN Kettle Cluster entries make it possible to to execute carte transforms in parallel using YARN. Carte clusters are implemented using the resources and data nodes of a Hadoop cluster, which optimizes resources and speeds processing.

Related Content:

Updated Support for Cassandra and MongoDB

PDI 5.1 provides support for newer versions of Cassandra and MongoDB.

Related Content:

Security Enhancements

PDI security has been enhanced to include support for more support for standard security protocols and specifications.

AES Password Support

Use Advanced Encryption Standard (AES) to encrypt passwords for databases, slave servers, web service connections, and more. AES uses a symmetric key algorithm to secure your data.  

Related Content:

New Execute Permission

You can now choose whether to grant permission to execute transformations and jobs by user role.  This provides more finely-tuned access controls for different groups and can be useful for auditing, deployment, or quality assurance purposes.

Related Content:

Kerberos Security Support

If you are already using Kerberos to authenticate access a data source, with a little extra configuration, you can also use Kerberos to authenticate DI users who need to access your data.

Related Content:

Impersonation Support

If your transformation or job must run on a MapR cluster or access its resources, you can use impersonation to specify that another Hadoop user will run transformations or jobs on behalf of the default admin account.  Impersonation is useful because it leverages another Hadoop user’s existing authentication and authorization settings.

Related Content:

Teradata and Vertica Bulkloaders

There are two new bulkloaders steps: Teradata Insert/Upsert TPT Bulkloader and Vertica Bulkloader.  Also, newer versions of Teradata and Vertica are now supported.

Related Content:

JBoss Platform Support

Deploy PDI on your existing JBoss web application server or a new one.  You can also choose whether to store house the DI Repository on a PostgreSQL, MySQL, or Oracle database.

Related Content:

New Marketplace Plugins

Pentaho Marketplace continues to grow with many more of your contributions.  As a testament to the power of community, Pentaho Marketplace is a home for your plugins and a place where you can contribute, learn, benefit from, and connect to others. New contributions include:

  • Vizor: A realtime monitoring and debugging tool for transforms that run in the Hadoop cluster.  Vizor helps you to more easily troubleshoot your transformations and jobs.  
  • Riak Consumer and Producer:  Links with Maven to provide dependency management. 
  • Load Text From File:  Uses Apache Tika to extract text from files in many different formats, such as PDF and XLS.
  • Top / Bottom / First / Last filter: Filters rows based on a field's values or row numbers.  
  • Map (key/value pair) type:  Provides a ValueMeta plugin for key/value pairs that are backed by a java.util.Map.
  • PDI Tableau Data Extract Output:  Use Pentaho's ETL capabilities to generate a Tableau Data Extract (tde) file.
  • PDI NuoDB: Provides a PDI database dialect for the NuoDB NewSQL database that works in the cloud.
  • Neo4j JDBC Connector:  Provides a PDI database dialect for the Neo4j graph database.
  • HTML to XML: Uses JTidy to convert HTML into XML or XHTML.
  • Apache Kafka Consumer and Producer: Reads and sends binary messages to and from Apache Kafka message queues.
  • LegStar z/OS File Reader: Reads raw z/OS records from a file and transforms them to PDI rows.
  • Compare Fields:  Compares 2 sets of fields in a row and directs it to the appropriate destination step based on the result. This step detects identical, changed, added, and removed rows.

There are many more new plugins such as IC AMQP, IC Bloom filter, JaRE  (Java Rule Engine), JDBC Metadata, Memcached Input/Output, Google Spreadsheet Input/Output, and Sentiment Analysis.

Related Content:

Documentation Changes

Our documentation has been moved to a new platform that is easier to use, search, and maintain.  There are three new guides. Some documentation has been co-located so that it is easier to find. 

Minor Functionality Changes

To learn more about minor functionality changes that might impact your upgrade experience, see the PDI 5.0 to 5.1 Functionality Change article.