Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Troubleshooting AEL

Follow the suggestions in these topics to help resolve common issues with running transformations with the Adaptive Execution Layer.

Steps Cannot Run in Parallel

If you are using the Spark engine to run a transformation with a step that cannot run in parallel, it generates errors in the log.

Some steps cannot run in parallel (on multiple nodes in a cluster), and will produce unexpected results. However, these steps can run as a coalesced dataset on a single node in a cluster. To enable a step to run as a coalesced dataset, add the step ID as a property value in the configuration file for using the Spark engine. 

Get the Step ID

Each PDI step has a step ID, a globally unique identifier of the step. Use either of the following two methods to get the  ID of a step:

Method 1: Retrieve the ID from the log

You can retrieve a step ID though the PDI client with the following steps:

  1. In the PDI client, create a new transformation and add the step to the transformation. For example, if you needed to know the ID for the Select values step, you would add that step to the new transformation.

  2. Set the log level to debug.
  3. Execute the transformation using the Spark engine. The step ID will display in the Logging tab of the Execution Results pane. For example, the log will display Selected the SelectValues step to run in parallel as a GenericSparkOperation, where SelectValues is the step ID.

Method 2: Retrieve the ID from the PDI plugin registry

If you are a developer, you can retrieve the step ID from the PDI plugin registry as described in Building Transformations Dynamically.

If you have created your own PDI transformation step plugin, the step ID is one of the annotation attributes that the developer supplies.

Add the Step ID to the Configuration File

The configuration file, org.pentaho.pdi.engine.spark.cfg, contains the forceCoalesceSteps property. The property is a pipe-delimited listing of all the IDs for the steps that should run with a coalesced dataset. Pentaho supplies a default set to which you can add IDs for steps that generate errors.

Perform the following steps to add another step ID to the configuration file:

  1. Navigate to the data-integration/system/karaf/etc folder and open the org.pentaho.pdi.engine.spark.cfg file.
  2. Append your step ID to the forceCoalesceSteps property value list, using a pipe character separator between the step IDs.
  3. Save and close the file.

Table Input Step Fails

If you run a transform using the Table Input step with a large database, the step will not complete. Use one of the following methods to resolve the issue:

Method 1: Load the data to HDFS before running the transform

  1. Run a different transformation using the Pentaho engine to move the data to the HDFS cluster.
  2. Then use HDFS Input to run the transformation using the Spark engine.

Method 2:  Increase the driver side memory configuration

  1. Navigate to the etc/ folder and open the org.pentaho.pdi.engine.daemon.cfg file.
  2. Increase the value of the sparkDriverMemory parameter, then save and close the file.

User ID Below Minimum Allowed

If you are using the Spark engine in a secured cluster and an error about minimum user ID occurs, the user ID of the proxy user is below the minimum user ID required by the cluster. See Cloudera documentation for details.

To resolve, change the ID of the proxy user to be higher than the minimum user ID specified for the cluster.