Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Turn a Transformation into a Pentaho Data Service


Explains how to turn a transformation into a JDBC Pentaho Data Service.

Sometimes, building and maintaining a data warehouse is impractical or costly, especially when you need to quickly blend and visualize fast-moving or quickly evolving data sets.  Instead of building a data warehouse, you can turn a  transformation into a Pentaho Data Service that helps you quickly analyze and visualize results from a virtual table.   For example, if you want to compare your product prices with your competitors, you could create a Pentaho Data Service that blends prices from your in-house data sources and competitor prices from the web.  Then, you can use Analyzer to slice, dice, and visualize the results.

A Pentaho Data Service is a transformation that has been published to the DI Server so it can be queried as a virtual database table.  This powerful feature turns an ordinary transformation into a JDBC data source that can be queried with simple SQL statements.  You can query a Pentaho Data Service from JDBC-compliant tool such as  Pentaho Report Designer or Interactive Reporting, as well as compatible non-Pentaho tools like R Studio or SQuirreL.  The Pentaho Data Service can be the heart of a data blending solution.  Use this feature to do these things.

  • Connect, combine and transform data from multiple sources,
  • Query data directly from any transformation,
  • Access architected blends from many JDBC-compliant tools.

The Pentaho Data Service is also useful when you have a transformation that yields a large data set, and your users want to see slices of data, on demand.   For instance, you could create a data service published to the DI Server that queries large research datasets in HDFS or a database.  You could then grant access to a smaller group of researchers so they can query the data that is returned by that transformation through the data service.  Applying filters to large result sets might take a long time, so you can optimize the Pentaho Data Service to push the filter directly to the data source rather than in memory on the DI Server.  This means researchers can get a fresh, customized slice of data much more quickly than they would with a traditional filter. Researchers can then use Pentaho Interactive Reporting or a tool of their choice, such as R Studio, to further analyze and visualize the results.

The push down optimization feature is available for data sources connected to from the Table Input and MongoDB Input steps.  This includes relational database management systems such as Postgres, MySQL, Oracle, and MS SQL Server.

Some SQL limitations apply to the SQL passed to the Pentaho Data Service. See the Pentaho Data Service SQL Support article for more details.  For a complete list of traditional data sources that we support, see our Components Reference article.

Read these articles to learn how to create and connect to a Pentaho Data Service.