Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Set Up the Adaptive Execution Layer (AEL)

Pentaho uses the Adaptive Execution Layer (AEL) for running transformations in different engines. AEL adapts steps from a transformation developed in PDI to native operators in the engine you select for your environment, such as Spark in a Hadoop cluster. The AEL daemon builds a transformation definition in Spark, which moves execution directly to the cluster.

Your installation of Pentaho 8.1 includes the AEL daemon which you can set up for production to run on your clusters. After you configure the AEL daemon, the PDI client communicates with both your Spark cluster and the AEL daemon, which lives on a node of your cluster to launch and run transformations. 

Before you can select the Spark engine through run configurations, you will need to configure AEL for your system and your workflow. Depending on your deployment, you may need to perform additional configuration tasks, such as setting up AEL in a secure cluster.

The Adaptive Execution Layer (AEL) supports most standard PDI steps; however, there are some steps that are not supported. For example, Metadata Injection (MDI) is not currently supported for steps running on AEL.

Before You Begin

You must meet the following requirements for using the AEL daemon and operating the Spark engine for transformations:

The dependency on Zookeeper has been removed from Pentaho 8.0. If you installed AEL for Pentaho 7.1, you must delete the adaptive-execution folder and follow the Pentaho 8.0 Installation instructions to use AEL with Pentaho 8.0.

Pentaho 8.1 Installation

When you install the Pentaho Server, the AEL daemon is installed in the folder data-integration/adaptive-execution. This folder will be referred to as 'PDI_AEL_DAEMON_HOME'.

Spark Client

The Spark client is required for the operation of the AEL daemon. The recommended versions of the Apache Spark client are 2.0, 2.1, and 2.2. Perform the following steps to install the Spark client.

  1. Download the Spark client, spark-2.1.0-bin-hadoop2.7.tgz, from
  2. Extract it to a folder on the cluster where the daemon can access it. This folder will be referred to as the variable 'SPARK_HOME'

Pentaho Spark Application

The Pentaho Spark application is built upon PDI's Kettle engine, which allows transformations to run unaltered within a Hadoop cluster. Some third-party plugins, such as those plugins available in the Pentaho Marketplace, may not be included by default within the Pentaho Spark application. To address this issue, we include the Spark Application builder tool so you can customize the Pentaho Spark application by adding or removing components to fit your needs. 

After running the Spark application builder tool, copy and unzip the resulting file to an edge node in your Hadoop cluster. The unpacked contents consist of the data-integration folder and the file, which includes only the required libraries needed by the Spark nodes themselves to execute a transformation when the AEL daemon is configured to run in YARN mode. Since the file needs to be accessible by all nodes in the cluster, it must be copied into HDFS. Spark distributes this .zip file to other nodes and then automatically extracts it.

Perform the following steps to run the Spark application build tool and manage the resulting files.

  1. Ensure that you have configured your PDI client with all the plugins that you will use.
  2. Navigate to the design-tools/data-integration folder and locate the spark-app-builder.bat (Windows) or the (Linux).
  3. Execute the Spark application builder tool script. A console window will display and the file will be created in the data-integration folder (unless otherwise specified by the -outputLocation parameter described below). 

    The following parameters can be used when running the script to build the

    Parameter Action
    –h or --help Displays the help.
    –e or --exclude-plugins Specifies plugins from the data-integration/plugins folder not to exclude from the assembly.
    –o or --outputLocation Specifies the output location.


  4. The file contains a data-integration folder and file. Copy the data-integration folder to the edge node where you want to run the AEL daemon.  
  5. Copy the file to the HDFS node where you will run Spark. This folder will be referred to as 'HDFS_SPARK_EXECUTOR_LOCATION'.   

For the cluster nodes to use the functionality provided by PDI plugins when executing a transformation, they must be installed into the PDI client prior to generating the Pentaho Spark application. If you install other plugins later, you must regenerate the Pentaho Spark application.

Configuring the AEL Daemon for Local Mode

Configuring the AEL daemon to run in Spark local mode is not supported, but can be useful for development and debugging.

You can configure the AEL daemon to run in Spark local mode for development or demonstration purposes. This will let you build and test a Spark application on your desktop with sample data, then reconfigure the application to run on your clusters. To configure the AEL daemon for a local mode, complete the following steps:

  1. Navigate to the .../data-integration/adaptive-execution/config directory and open the file.
  2. Set the following properties for your environment:
  • Set the sparkHome property to the Spark 2 filepath on your local machine.
  • Set the sparkApp property to the data-integration directory.
  • Set the hadoopConfDir property to the directory containing the *site.xml  files.
  1. Save and close the file.
  2. Navigate to the data-integration/adaptive-execution folder and run the command from the command line interface.

Configuring the AEL Daemon in YARN Mode

Typically, the AEL daemon is run in YARN mode for production purposes. In YARN mode, the driver application launches and delegates work to the YARN cluster. The pdi-spark-executor application must be installed on each of the YARN nodes.

The script is only supported in UNIX-based environments.

To configure the AEL daemon for a YARN production environment, complete the following steps.

  1. Navigate to the …/data-integration/adaptive-execution/config directory and open the file.
  2. Set the following properties for your environment:
    Property Value
    websocketURL The fully-qualified domain name of the node where the AEL daemon is installed. For example, websocketURL=ws://localhost:${ael.unencrypted.port}
    sparkHome The path to the Spark client folder on your cluster
    sparkApp The data-integration directory
    hadoopConfDir The directory containing the *site.xml files. This property value tells Spark which Hadoop/YARN cluster to use. You can download the directory containing the *site.xml files using the cluster management tool, or you can set the hadoopConfDir property to the location in the cluster.
    hadoopUser The user ID the Spark application will use, if you are not using security.






  3. Save and close the file.
  4. Copy the file to your HDFS cluster, as in the example below.
    $ hdfs dfs put /opt/pentaho/
  5. Run the pdi-daemon startup script, from the command line interface.

You can manually start the AEL daemon by running the By default, this startup script is installed in the data-integration/adaptive-execution folder, which is referred to as the variable 'PDI_AEL_DAEMON_HOME'.

Perform the following steps to manually start the AEL daemon.

  1. Navigate to the data-integration/adaptive-execution directory.
  2. Run the script.

The script supports the following commands:

Command Action Starts the daemon as a foreground process. start Starts the daemon as a background process.  Logs are written to the PDI_AEL_DAEMON_HOME/daemon.log file. stop Stops the daemon. status Reports the status of the daemon.

Configure Event Logging

Spark events can be captured in an event log that can be viewed with the Spark History Server. The Spark History Server is a web browser-based user interface to the event log. You can view either running or completed Spark transformations using the Spark History Server. Before you can use the Spark History Server, you must configure AEL to log the events.

Perform the following tasks to configure AEL to log events:

  1. Navigate to the data-integration/adaptive-execution/config directory and open the file.
  2. Set the sparkEventLogEnabled property to true. If this field is missing or set to false, Spark does not log events.
  3. Set the sparkEventLogDir property to a directory where you want to store the log. This can either be a file system directory (for example, file:///users/home/spark-events), or an HDFS directory (for example, hdfs:/usrs/home/spark-events).
  4. Set the spark.history.fs.logDirectory property to point to the same event log directory you configured in the previous step.

You can now view PDI transformations using the Spark History Server.

Refer to the following documents for more information on running the Spark History Server:

Vendor-Supplied Clients

Additional configuration steps may be required when using AEL with a vendor’s version of the Spark client.


If your Cloudera Spark client does not contain the Hadoop libraries, you must add the Hadoop libraries to the classpath using the SPARK_DIST_CLASSPATH environment variable. To do this they can use the following command:

export SPARK_DIST_CLASSPATH=$(hadoop classpath)


The Hortonworks Data Platform (HDP) version on your edge node where your Pentaho server resides must be the same version used on your cluster or the AEL daemon and the PDI client will stop working. To prevent this from happening, you must export the HDP_VERSION variable. For example:


You can check the HDP version on your cluster with the following command:

hdp-select status hadoop-client


To configure the AEL daemon to run in a MapR Spark 2.1 production environment, complete the following steps.

  1. Navigate to the .../data-integration/adaptive-execution/config directory and open the file.
  2. Set the following property for your MapR Spark 2.1 environment:
    Property Value

    This property identifies the Hadoop cluster that Spark will use. 

    Because MapR identifies the Hadoop cluster by default, set the property value to empty, as shown here:



    This property identifies the security environment that the Hadoop cluster will use.

    If you enable security, the value of the MAPR_ECOSYSTEM_LOGIN_OPTS environment variable will include the 'hybrid JVM' option for the hadoop.login property.

    Set the property value to 'hybrid'  to specify a mixed security environment using Kerberos and internal MapR security technologies as shown here:


    This property identifies the configuration file to use when you enable security.

    The MapR distribution for Hadoop uses the Java Authentication and Authorization Service (JAAS) to control security features. The /opt/mapr/conf/mapr.login.conf file specifies configuration parameters for JAAS.

    Set the property value to /opt/mapr/conf/mapr.login.conf as shown here:


  3. Save and close the file.
  4. Before running the daemon, add the Hadoop libraries to the classpath by running the following command from the command prompt (terminal window) on the cluster:
    export SPARK_DIST_CLASSPATH=$(hadoop classpath)

 You can now test your AEL configuration by creating a run configuration using the Spark engine. Refer to Run Configurations for more details. 

Amazon EMR

When running AEL on Amazon EMR, LZO compression and Oracle JDK 8 are required components. 

LZO Support

LZO is a compression format supported by Amazon EMR. It is required for running AEL on EMR. To configure for LZO compression, you will need to add several properties.

  1. Follow the instructions available here to install the Linux LZO compression library from the command line:
  2. Navigate to the .../data-integration/adaptive-execution/config/ directory and open the file.

  3. Add the following properties:
  • spark.executor.extraClassPath= /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
  • spark.driver.extraClassPath = /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
  1. Append the following properties to include -Djava.library.path=/usr/lib/hadoop-lzo/lib/native at the end of each line:
  • sparkExecutorExtraJavaOptions
  • sparkDriverExtraJavaOptions
  1. Save and close the file.

Oracle JDK 8

Amazon EMR uses Open JDK 8 while Pentaho AEL is supported on Oracle JDK 8 only. Therefore, users are required to install Oracle JDK 8 to properly run their EMR instances on AEL to be supported. To access a sample script for installing Oracle JDK 8, see

The content contained within this link is an example and is provided with no warranty. Hitachi Vantara shall not be liable in the event of incidental or consequential damages in connection with or arising out of, the furnishing, performance, or use of the content provided here. The user is responsible for accepting the license agreement with Oracle.

Advanced Topics

The following topics help to extend your knowledge of the Adaptive Execution Layer beyond basic setup and use:

  • Specify Additional Spark Properties
    You can define additional Spark properties within the file or as run modification parameters within a transformation.
  • Configuring AEL with Spark in a Secure Cluster
    If your AEL daemon server and your cluster machines are in a secure environment like a data center, you may only want to configure a secure connection between the PDI client and the AEL daemon server.


See our list of common problems and resolutions.