Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Evaluate and Learn Pentaho Data Integration

As you explore Pentaho Data Integration, you will be introduced to the major components, watch videos, work through hands-on examples, and read about the different features. Go at your own pace. Feel free to dig into the documentation or to contact Pentaho sales support if you have questions.

The PDI evaluation track is divided into several parts.

PDI Basics

This section familiarizes you with PDI and introduces you to basic terminology and concepts. Then, you learn how to start and configure Spoon and take a spin through the interface.

Table 1. PDI Basics Checklist
Task Do This Objective
What is PDI and How Does it Fit Into the Pentaho Business Analytics Platform?
  • Understand PDI's role in the Pentaho Business Analytics platform.
PDI Components
  • Understand different components of the Pentaho Integration Architecture.
  • Learn the primary functions of the Data Integration Server.
  • Understand the relationship between PDI and Kettle.

Get Acquainted with Spoon

Spoon is the PDI design tool. In this section you will set up Spoon, take a tour of the Spoon interface, and learn about the different Spoon perspectives.

Table 2. Get Acquainted with Spoon Checklist
Task Do This Objective
Install and Configure PDI Components
  • Learn about Pentaho's hardware and software requirements.
  • Install PDI Using the Wizard
  • Configure the DI Server and Spoon
Introduction to Spoon
  • Learn how to start the DI Server and Spoon.
  • Understand the different parts of the Spoon interface.
PDI Basic Terminology
  • Become familiar with transformations, steps, hops, and jobs.
  • Understand the relationship between transformations, steps, and hops.
  • Know what a job is, and what a job is composed of.
  • Be able to apply what you have learned about terminology to the Spoon interface.
Get a Different Perspective
  • Be able to identify different perspectives.
  • Learn how to access different perspectives.
  • Be able to identify the basic types of tasks that you can perform with the different perspectives.

Build Transformations and Jobs

Now that your environment is set up and you are familiar with Spoon, you are ready to build transformations and jobs. Work through the tutorial in this section.

Table 3. Build Transformations and Jobs Checklist
Task Do This Objective
Create a Connection to the DI Repository
  • Learn how to create a connection to the DI Repository.
Create Your First Transformation
  • Work through the exercise on Creating a Transformation that involves a flat file. Click through the links that are on the bottom of the page to complete the exercise.
  • Learn how to retrieve data from a flat file using an Input step.
  • Apply filters and to create a hop.
  • Load data into a relational database and learn how to test database connections.
  • Follow an example on how to resolve missing information.
  • Run the transformation.
Create a Job
  • Be able to articulate why you would create a job.
  • Create a job for a transformation.
Schedule a Job
  • Schedule a job.
Learn more about commonly-used steps and job entries.
  • There are over 330 available job entries and transformation steps.  There is a step for virtually anything that you want to do.  But, when you are learning about PDI, it is helpful to review the most commonly-used steps and entries.

Explore Big Data and Streamlined Data Refinery

In this section, you will learn how to use transformation steps to connect to a variety of Big Data data sources, including Hadoop, NoSQL, and analytical databases such as MongoDB. You can then try working through the detailed, step-by-step tutorials, and peruse the out-of-the-box steps that Spoon provides. Learn how to work with Streamlined Data Refinery.  Then, you will have an opportunity to move beyond the basics and learn how to edit transformations and metadata models.

Table 4. Explore Big Data and Streamlined Data Refinery Checklist
Task Do This Objective
What is Big Data?
  • Gain an overview of Big Data and PDI.
  • Check out Cassandra, Splunk, MongoDB, and Hadoop Big Data resources.
Learn about Streamlined Data Refinery
  • Learn how streamlined data refinery works. 
Introduction to DI Big Data Steps and Transformations
  • Review available Big Data transformation steps.
  • Review Big Data job steps.
  • Review the YARN steps.
Configure a Hadoop Distribution
  • Configure a Hadoop Distribution.
  • Pentaho's Big Data adaptive layer supports over 20 different versions of popular Hadoop distributions such as Apache, Cloudera, Hortonworks, MapR, and EMR.
PDI, Hadoop, Cassandra, MongoDB, and MapR Tutorials
  • Work through Big Data Tutorials.
  • Explore how to load, transform, extract, and report on data in a Hadoop and MapR clusters.
  • Learn how to write and read data to and from Cassandra, and how to create reports.
  • Learn how to create MongoDB reports and how to read and write data to and from MongoDB.
Beyond the Basics: Edit Transformations and Metadata Models
  • Edit data transformations
  • Edit metadata models
Blend Big Data
  • Learn about the concept of blending data.
  • Learn about Pentaho's just-in-time approach Big Data blending.

About Kitchen, Pan, and Carte

These lessons provide an overview of Kitchen and Pan, which are command lines tool for executing jobs and transformations modeled in Spoon. You will also learn about Carte, which is a web server that enables remote execution of jobs and transformations.

Table 5. About Kitchen, Pan, and Carte Checklist
Task Do This Objective
What are Kitchen and Pan?
  • Gain an overview of Kitchen and Pan.
Intro to Kitchen, Pan, and Carte.
  • Learn about the capabilities of Kitchen, Pan, and Carte

Learn More

Now that you have completed an initial evaluation of PDI, dig a little deeper.  Find out how to:

Next Steps

  • Contact Pentaho sales support to learn more about how Pentaho can be customized to meet your needs. The flexibility of PDI means that you can explore, process, transform, export, and visualize data in a variety of ways.
  • Check out the DI Development workflow.