Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Adaptive Execution Layer

Parent article

Pentaho uses an Adaptive Execution Layer (AEL) for running transformations with Spark. AEL adapts steps from the transformation you develop in PDI to use the native operator functions in Spark. This adaptation is necessary because the Spark engine runs big data transformations in the Hadoop cluster differently than the Pentaho engine. For example, the Spark engine may not require some fields in the PDI step, or it may require setting a precise value setting for an option. Also, null values must be adjusted because Spark processes null values differently than the Pentaho engine.

Some PDI steps commonly used in big data transformations are specifically coded to the Spark APIs for improved performance when using the Spark engine. To see whether the step you want to use has been optimized for distributed processing with Spark, refer the documentation for that step. You can also view the list of Recommended PDI steps to use with Spark on AEL.

To decide whether the Spark engine or the Pentaho engine is the best choice for your transformation, you must know what cluster resources you have and the size of your data sets.

Set Up AEL

AEL must be configured before using the Spark engine in the run configuration of your transformation. Refer your Pentaho or IT administrator to Setting Up the Adaptive Execution Layer for more details.

Use AEL

Once configured, you can select the Spark engine for the transformation. See Run Configurations for more details.

Vendor-specific setups for Spark

The following PDI big data steps have vendor-specific setups or specific vendor versions that are required when running the steps on Spark:

Advanced topics

The following topics extend your knowledge of the Adaptive Execution Layer beyond basic setup and use:

  • Specify Additional Spark Properties

    You can define additional Spark properties within the application.properties file or as run modification parameters within a transformation.

  • Configuring AEL with Spark in a Secure Cluster

    If your AEL daemon server and your cluster machines are in a secure environment like a data center, you may only want to configure a secure connection between the PDI client and the AEL daemon server.

  • AEL logging

    Pentaho provides logging for transformation and jobs which are executed on the Adaptive Execution Layer.

  • Spark Tuning

    Spark tuning parameters are available for PDI steps where it is functionally applicable.

Troubleshooting

See our list of common problems and resolutions.

Learn more