Create DI Solutions
Overview
Learn how PDI provides the Extraction, Transformation, and Loading (ETL) engine that facilitates the process of capturing the right data, cleansing the data, and storing the data using a uniform and consistent format.
User Guide
- Introduction
- Provides an overview of the guide.
- Pentaho Data Integration Architecture
- Provides information about PDI's components.
- Create DI Repository Connections
- Explains how to connect and disconnect from the DI Repository.
- Terminology and Basic Concepts
- Presents basic terminology and concepts that will help you understand how to use PDI.
- Create Transformations
- Explains how to create, save, and run a transformation. Also explains how to build a job.
- Executing Transformations
- Indicates how to initialize slave servers, execute jobs and transformations, and how to perform an impact analysis.
- Working with the DI Repository
- Explains how to delete, manage content, and use version control in the DI Repository.
- Deleting a Repository
- Managing Content in the DI Repository Explorer
- Setting Folder-Level Permissions
- Access Control List (ACL) Permissions
- Exporting Content from Solutions Repositories with Command-Line Tools
- Working with Version Control
- Examining Version History
- Restoring a Previously Saved Version of a Job or Transformation
- Reusing Transformation Flows with Mapping Steps
- Explains how to reuse repeated steps.
- Arguments, Parameters, and Variables
- Provides information on the three paradigms for storing user input: arguments, parameters, and variables.
- Rapid Analysis Schema Prototyping
- Explains how to use Agile BI to rapidly prototype schemas.
- Using the SQL Editor
- This provides informationj about the SQL editor.
- Using the Database Explorer
- Explains how to explore configured database connections.
- Unsupported Databases
- Provides information on unsupported databases.
- Performance Monitoring and Logging
- Provides information on how to set up logging and performance monitoring in Spoon.
- Working with Big Data and Hadoop in PDI
- Explains how to work with Big Data and Hadoop in PDI.
- Pentaho MapReduce Workflow
- PDI Hadoop Job Workflow
- Hadoop to PDI Data Type Conversion
- Hadoop Hive-Specific SQL Limitations
- Big Data Tutorials
- Hadoop Tutorials
- Loading Data into a Hadoop Cluster
- Prerequisites
- Sample Data
- Using a Job Entry to Load Data into Hadoop's Distributed File System (HDFS)
- Using a Job Entry to Load Data into Hive
- Using a Transformation Step to Load Data into HBase
- Transforming Data within a Hadoop Cluster
- Extracting Data from a Hadoop Cluster
- Reporting on Data within a Hadoop Cluster
- MapR Tutorials
- Loading Data into a MapR Cluster
- Transforming Data within a MapR Cluster
- Extracting Data from a MapR Cluster
- Reporting on Data within a MapR Cluster
- Cassandra Tutorials
- MongoDB Tutorials
- Implement Data Services with the Thin Kettle JDBC Driver
- Explains how to use the Thin Kettle JDBC Driver.
- Transactional Databases and Job Rollback
- Provides informatino onh how to roll back jobs should transformations and jobs fail.
- Interacting With Web Services
- Provides information on how to interact with web services.
- Scheduling and Scripting PDI Content
- Explains how to schedule and script PDI content.
- Scheduling Transformations and Jobs From Spoon
- Command-Line Scripting Through Pan and Kitchen
- Pan Options and Syntax
- Pan Status Codes
- Kitchen Options and Syntax
- Kitchen Status Codes
- Importing KJB or KTR Files From a Zip Archive
- Connecting to a DI Solution Repository with Command-Line Tools
- Exporting Content from Solutions Repositories with Command-Line Tools
- About PDI Marketplace
- Provides information about the PDI Marketplace. The PDI Marketplace contains community contributed steps and entries.
- Troubleshooting
- Provides troubleshooting information for PDI.
- Changing the Pentaho Data Integration Home Directory Location (.kettle folder)
- Changing the Kettle Home Directory within the Pentaho BI Platform
- Kitchen can't read KJBs from a Zip export
- Generating a DI Repository Configuration Without Running Spoon
- Connecting to a DI Solution Repository with Command-Line Tools
- Unable to Get List of Repositories Exception
- Executing Jobs and Transformations from the Repository on the Carte Server
- Database Locks When Reading and Updating From A Single Table
- Reading and Updating Table Rows Within a Transformation
- Force PDI to use DATE instead of TIMESTAMP in Parameterized SQL Queries
- PDI Does Not Recognize Changes Made To a Table
- Using ODBC
- Sqoop Import into Hive Fails