Set up the Adaptive Execution Layer (AEL)
Pentaho uses the Adaptive Execution Layer for running transformations on the Spark Distributive Compute Engine. AEL adapts steps from a transformation developed in PDI to Spark-native operators. The AEL daemon builds a transformation definition in Spark, which moves execution directly to the cluster.
Your installation of Pentaho includes the AEL daemon which you can set up for production to run on your clusters. After you configure the AEL daemon, the PDI client communicates with both your Spark cluster and the AEL daemon, which lives on a node of your cluster to launch and run transformations.
Before you can select the Spark engine through run configurations, you will need to configure AEL for your system and your workflow. Depending on your deployment, you may need to perform additional configuration tasks, such as setting up AEL in a secure cluster.
AEL runs PDI transformations in Spark-centric manner, which is documented for each step using the Spark engine.
Before you begin
You must meet the following requirements for using the AEL daemon and operating the Spark engine for transformations:
- Pentaho 9.3 or later installation. See Pentaho installation.
- One of the following Hadoop distributions and versions:
- Cloudera 6.1 or later
- Hortonworks 3.0 and 3.1
- Amazon EMR 5.21 and 5.24
- Microsoft Azure HDInsight 4.0
- Spark client 2.3 or later.
- Pentaho Spark application 9.3.
- If you are configuring AEL for use with Cloudera, Hortonworks, or Amazon EMR, review Vendor-Supplied Clients.
Pentaho installation
When you install the Pentaho Server, the AEL daemon is installed in the folder data-integration/adaptive-execution. This folder will be referred to as PDI_AEL_DAEMON_HOME.
Spark client
The Spark client is required for the operation of the AEL daemon. There are two ways to use a Spark client with AEL:
- Install a new instance of the Spark client in a location of your choice.
- Use a Spark client instance already installed on an existing cluster.
Install a new instance of the Spark client
Procedure
Download the Spark client from the following location to the machine where you will run the AEL daemon: http://spark.apache.org/downloads.html
As a best practice, use Apache Spark client 2.3 or 2.4. Version 2.3 is used in the following examples.
For example, download spark-2.3.0-bin-hadoop2.7.tgz if you are using Spark 2.3 on Hadoop 2.7Extract the downloaded TGZ file to a designated folder where the Spark client will reside.
For AEL installation, the folder name you designate is the target folder for the sparkHome= parameter.For example, this extraction command: tar zxf /your_path/spark-2.3.0-bin-hadoop2.7.tgz results in the following path:/your_path/spark-2.3.0-bin-hadoop2.7/
where: /your_path is the designated folder.
Copy the path that was created to the application.properties file and the sparkHome= parameter as shown below.
sparkHome=/your_path/spark-x.x.x-bin-hadoopx.x/where:
your_path: is the folder where you downloaded or unzipped the TGZ file.
spark-x.x.x-bin-hadooop.x: is the version of the Spark client you are using.
For example, if your folder is called spark230:
sparkHome=/spark230/spark-2.3.0-bin-hadoop2.7/
Use a Spark client already installed on a cluster
To use a Spark client that already resides on a cluster, specify the cluster path in the sparkHome= parameter in the application.properties file. For example:
sparkHome=/cluster_path/spark-2.4.5-bin-hadoop2.7/
where cluster_path is your specific path.
The Spark client is started as part of the AEL execution and does not require any manual startup. The following examples show common cluster configurations.
Cluster Configuration | Example Entry |
CDH 6.1 | sparkHome=/opt/cloudera/parcels/CDH-6.1.1-1.cdh6.1.1.p0.875250/lib/spark/ |
CDH 6.2 | sparkHome=/opt/cloudera/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373/lib/spark/ |
EMR 5.24 | sparkHome=/usr/lib/spark/ |
GDP 1.4.2.1 | sparkHome=/usr/lib/spark/ |
HDP 3.1 | sparkHome=/usr/hdp/current/spark2-client |
Pentaho Spark application
After running the Spark application builder tool, copy and unzip the resulting pdi-spark-driver.zip file to an edge node in your Hadoop cluster. The unpacked contents consist of the data-integration folder and the pdi-spark-executor.zip file, which includes only the required libraries needed by the Spark nodes themselves to execute a transformation when the AEL daemon is configured to run in YARN mode. Since the pdi-spark-executor.zip file needs to be accessible by all nodes in the cluster, it must be copied into HDFS. Spark distributes this ZIP file to other nodes and then automatically extracts it.
Perform the following steps to run the Spark application build tool and manage the resulting files:
Procedure
Ensure that you have configured your PDI client with all the plugins that you will use.
Navigate to the design-tools/data-integration folder and locate the spark-app-builder.bat (Windows) or the spark-app-builder.sh (Linux).
Execute the Spark application builder tool script.
A console window will display and the pdi-spark-driver.zip file will be created in the data-integration folder (unless otherwise specified by the -outputLocation parameter described below).
The following parameters can be used when running the script to build the pdi-spark-driver.zip.
Parameter Action –h or --help Displays the help. –e or --exclude-plugins Specifies plugins from the data-integration/plugins folder not to exclude from the assembly. –o or --outputLocation Specifies the output location. The pdi-spark-driver.zip file contains a data-integration folder and pdi-spark-executor.zip file.
Copy the data-integration folder to the edge node where you want to run the AEL daemon.
Copy the pdi-spark-executor.zip file to the HDFS node where you will run Spark.
This folder will be referred to as HDFS_SPARK_EXECUTOR_LOCATION.
Next steps
Configure the AEL daemon for local mode
To configure the AEL daemon for a local mode, complete the following steps:
Procedure
Navigate to the data-integration/adaptive-execution/config directory and open the application.properties file.
Set the following properties for your environment:
Set the sparkHome property to the Spark 2 filepath on your local machine.
Set the sparkApp property to the data-integration directory.
Set the hadoopConfDir property to the directory containing the *site.xml files.
Save and close the file.
Navigate to the data-integration/adaptive-execution folder and run the daemon.sh command from the command line interface.
Configure the AEL daemon for YARN mode
The daemon.sh script is only supported in UNIX-based environments.
To configure the AEL daemon for a YARN production environment, complete the following steps.
Procedure
Navigate to the data-integration/adaptive-execution/config directory and open the application.properties file.
Set the following properties for your environment:
Property Value websocketURL The fully-qualified domain name of the node where the AEL daemon is installed. The following command is an example of how to obtain the fully qualified name: [devuser@hito31-n2 ~]$ hostname -f hito31-n2.cs1cloud.internal
An example of a fully qualified name is websocketURL=ws://localhost:${ael.unencrypted.port}.
sparkHome The path to the Spark client folder on your cluster sparkApp The data-integration directory hadoopConfDir The directory containing the *site.xml files. This property value tells Spark which Hadoop/YARN cluster to use. You can download the directory containing the *site.xml files using the cluster management tool, or you can set the hadoopConfDir property to the location in the cluster. hadoopUser The user ID the Spark application will use. This user must have permissions to access the file in the Hadoop file system. hbaseConfDir The directory containing the hbase-site.xml file. This property value tells Spark how HBase is configured for your cluster. You can download the directory containing the *site.xml files using the cluster management tool, or you can set the hadoopConfDir property to the location in the cluster. sparkMaster yarn SparkDeployMode client NoteYARN-cluster deployment mode in YARN is not supported by AELassemblyZip hdfs:$HDFS_SPARK_EXECUTOR_LOCATION Save and close the file.
Copy the pdi-spark-executor.zip file to your HDFS cluster, as shown in the following example:
$ hdfs dfs put pdi-spark-executor.zip /opt/pentaho/pdi-spark-executor.zip
Perform the following steps to start the AEL daemon.
You can start the AEL daemon by running the daemon.sh script. By default, this startup script is installed in the data-integration/adaptive-execution folder, which is referred to as the variable PDI_AEL_DAEMON_HOME.Navigate to the data-integration/adaptive-execution directory.
Run the daemon.sh script.
The daemon.sh script supports the following commands:Command Action daemon.sh Starts the daemon as a foreground process. daemon.sh start Starts the daemon as a background process. Logs are written to the PDI_AEL_DAEMON_HOME/daemon.log file. daemon.sh stop Stops the daemon. daemon.sh status Reports the status of the daemon.
Configure event logging
Perform the following tasks to configure AEL to log events:
Procedure
Have your cluster administrator enable the Spark History Server on your cluster and give you the location of the Spark event log directory.
Navigate to the data-integration/adaptive-execution/config directory and open the application.properties file.
Set the sparkEventLogEnabled property to true.
If this field is missing or set to false, Spark does not log events.Set the sparkEventLogDir property to a directory where you want to store the log.
This location can either be a file system directory (for example, file:///users/home/spark-events), or an HDFS directory (for example, hdfs:/usrs/home/spark-events).Set the spark.history.fs.logDirectory property to point to the same event log directory you configured in the previous step.
Results
Next steps
- https://spark.apache.org/docs/latest/monitoring.html
- https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/configuring-spark/content/configuring_the_spark_history_server_kerberos.html
- https://www.cloudera.com/documentation/enterprise/6-1-x/topics/operation_spark_applications.html
- https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-application-history.html
- https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-history-server.html
Vendor-supplied clients
Additional configuration steps may be required when using AEL with a vendor’s version of the Spark client.
Cloudera
If your Cloudera Spark client does not contain the Hadoop libraries, you must add the Hadoop libraries to the classpath using the SPARK_DIST_CLASSPATH environment variable, as shown in the following example command:
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
Hortonworks
You can use multiple vendor versions of the Hortonworks Data Platform (HDP) with the PDI client. To use the vendor’s version of the Spark client with Hive Warehouse Connector (HWC) on HDP 3.x platforms, you must configure the AEL daemon for the Hive Warehouse Connector.
To use HBase with AEL and HDP, you must add copies of HBase JAR files to your PDI distribution.
Use HBase with AEL and HDP
Perform the following steps to add the HBase JAR files:
Procedure
Copy the following files for your version of HDP from the /usr/hdp/current/hbase/lib/ directory of your cluster.
hbase-client-<x.x.x>.jar
hbase-common-<x.x.x>.jar
hbase-hadoop-compat-<x.x.x>.jar
hbase-mapreduce-<x.x.x>.jar
hbase-protocol-<x.x.x>.jar
hbase-protocol-shaded-<x.x.x>.jar
hbase-server-<x.x.x>.jar
hbase-shaded-miscellaneous-<x.x.x>.jar
hbase-shaded-netty-<x.x.x>.jar
hbase-shaded-protobuf-<x.x.x>.jar
Follow the instructions in Set up the vendor-specified JARs to install the files.
Amazon EMR
If you plan to use AEL with Amazon EMR, note the following conditions:
- To use Amazon EMR with AEL, you must install the Linux LZO compression library. See LZO support for more information.
- To use Amazon EMR with AEL and Hive, you must Configure the AEL daemon for a Hive service.
- To use the HBase Input and HBase Output steps with AEL and Amazon EMR, see Using HBase steps with Amazon EMR 5.21.
- Because of limitations in Amazon EMR 4.0 and later, Impala is not supported
on Spark. NoteImpala is not available as a download on the EMR Cluster configuration menu.
LZO support
Procedure
Follow the instructions available here to install the Linux LZO compression library from the command line: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_command-line-installation/content/install_compression_libraries.html
Navigate to the data-integration/adaptive-execution/config/ directory and open the application.properties file.
Add the following properties:
- spark.executor.extraClassPath= /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
- spark.driver.extraClassPath = /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar
Append the following properties to include -Djava.library.path=/usr/lib/hadoop-lzo/lib/native at the end of each line:
- sparkExecutorExtraJavaOptions
- sparkDriverExtraJavaOptions
Save and close the file.
Use HBase with AEL and Amazon EMR
Perform the following steps to add the HBase libraries:
Procedure
Stop the AEL daemon.
From a command prompt (terminal window) on the cluster, run the following command:
add SPARK_DIST_CLASSPATH=$(hbase classpath)
Start the AEL daemon.
Hive
To achieve the best performance using Hive, ensure that you have optimized your AEL environment as described in About Spark tuning in PDI. After tuning Spark, you can make additional improvements to Hive performance with the following tuning techniques:
- Setup Hive partitioning on the tables for more efficient queries and use bucketing for manageable dataset parts. For more information, see hive-partitioning-vs-bucketing.
- Use the
hive.auto.convert.join
parameter to reduce query times. - Use the
mapred.compress.map.output
parameter to save cluster space. - Enable parallel execution to improve cluster utilization.
- For better pipeline and cache usage, enable vectorization to batch process rows and perform operations on column vectors.
- Configure the Live Long and Processed (LLAP) queue capacity to maximize the YARN resources for LLAP without wasting cluster space.
For more information about these methods, see hive-best-practices. Refer to your vendor-specific documentation for implementation.
Lastly, you might consider the differences between the Amazon's Elastic Map Reduce (EMR) and other Hive environments, specifically how formats are handled differently. EMR uses the Parquet storage format, instead of ORC with compression, to provide better performance. EMR, however, does not support LLAP. For more information on other exceptions, see emr-hive-differences.
The following sections show you how to use Spark on AEL with Hive. Pentaho supports Hive access from Spark for Amazon's Elastic MapReduce 5.24 and Hortonworks Data Platform 3.x.
Supported data types
Pre-existing Hive tables that use the Varchar data type are converted to strings when you select the Truncate Table option in the Table Output step. Pentaho limits support for binary types and does not recommend using Hive binary types with AEL.
The following table lists the supported Hive data types:
Spark data type | Hive data type | Pentaho support | Pentaho data type |
ByteType | TinyInt | Supported | Integer |
ShortType | SmallInt | Supported | Integer |
IntegerType | Integer | Supported | Integer |
LongType | BigInt | Supported | BigNumber |
FloatType | Float | Supported | Number |
DoubleType | Double | Supported | Number |
DecimalType | Decimal | Supported | BigNumber |
StringType | String, Char, Varchar | Supported, except Varchar with length | String |
BinaryType | Binary | Not supported | N/A |
BooleanType | Boolean | Supported | Boolean |
TimestampType | Timestamp | Not supported, yet converted to String | String |
DataType | Date | Supported | Date |
ArrayType | Array | Not Supported | N/A |
StructType | Struct | Not Supported | N/A |
Configure the AEL daemon for a Hive service
application.properties
file of the AEL daemon if you want to: - Use Hive tables on a secure supported HDP cluster.
- Use Hive managed and unmanaged tables in an ORC or Parquet format on your Amazon EMR cluster.
- Use Hive managed and unmanaged tables in an ORC or Parquet format on your Google Dataproc cluster.
To configure the properties file, perform the following steps.
Procedure
Navigate to the
data-integration/adaptive-execution/config
directory and open theapplication.properties
file with any text editor.Set the values for your environment as shown in the following table.
The following lines of code show sample values for these parameters:Parameter Value enableHiveConnection
Enables AEL access to Hive tables. Set this value to true. spark.driver.extraClassPath
Specifies the path to the directory containing the hive-site.xml file on the driver node. It loads the hive-site.xml file as a resource in the driver. This resource defines the Hive endpoints and security setting required by AEL to access the Hive subsystem. spark.executor.extraClassPath
Specifies the path to the directory containing the hive-site.xml on the executor nodes. It loads the hive-site.xml file as a resource on each executor. This resource defines the Hive endpoints and security setting required by AEL to access the Hive subsystem. # AEL Spark Hive Property Settings enableHiveConnection=true enableHiveWarehouseConnector=false spark.driver.extraClassPath=/etc/spark/conf.dist/ spark.executor.extraClassPath=/etc/spark/conf.dist/
Save and close the file.
Restart the AEL daemon.
Results
Configuring the AEL daemon for the Hive Warehouse Connector on your Hortonworks cluster
You can use PDI with the Hive Warehouse Connector (HWC) to access Hive managed tables in an
ORC format or large unmanaged tables in Hive on secure supported HDP clusters. You can set
the access controls and the LLAP queue by configuring the
application.properties
file of the AEL daemon.
Before you begin
Before you begin, you will need to perform the following tasks.
- Download and install Apache Ambari from https://ambari.apache.org/ to obtain the Hive connection information.
- Determine LLAP sizing and setup needed for your Hive LLAP daemon. See https://community.cloudera.com/t5/Community-Articles/Hive-LLAP-deep-dive/ta-p/248893 and https://community.cloudera.com/t5/Community-Articles/LLAP-sizing-and-setup/ta-p/247425 for instructions.
- Set up the Hive LLAP queue on your HDP cluster. See https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/performance-tuning/content/hive_setting_up_llap.html for instructions.
Configure the AEL daemon for the Hive Warehouse Connector
Perform the following steps.
Procedure
Navigate to the
data-integration/adaptive-execution/config
directory and open theapplication.properties
file with any text editor.Set the values for your environment as shown in the following table.
The following lines of code show sample values for these parameters:Parameter Value enableHiveConnection
Enables AEL access to Hive tables. Set this value to true. hiveMetastoreUris
Identifies the location of Hive metastore. Set this value to thrift://<fully qualified hostname>:9083. spark.sql.hive.hiveserver2.jdbc.url
Identifies the location of the interactive service. Use the value found at Ambari Services > Hive > Summary > HIVESERVER2 INTERACTIVE JDBC URL. spark.datasource.hive.warehouse.metastoreUri
Identifies the location of the Hive metastore. Use the value found at Ambari Services > Hive > CONFIGS > ADVANCED > General > hive.metastore.uris. spark.datasource.hive.warehouse.load.staging.dir
Determines the HDFS temporary directory used for batch writing to Hive. Set this value to /tmp. NoteEnsure that your HWC users have permissions for this directory.spark.hadoop.hive.llap.daemon.service.hosts
Specifies the name of the LLAP queue. Use the value found at Ambari Services > Hive > CONFIGS > ADVANCED > Advanced hive-interactive-site > hive.llap.daemon.service.hosts. spark.hadoop.hive.zookeeper.quorum
Provides the Hive endpoint to access the Hive tables. Use the value found at Ambari Services > Hive > CONFIGS > ADVANCED > Advanced hive-site > hive.zookeeper.quorum. spark.driver.extraClassPath
Specifies the path to the directory containing the hive-site.xml file on the driver node. It causes the hive-site.xml file to be loaded as a resource in the driver. This resource defines the Hive endpoints and security setting required by AEL to access the Hive subsystem. spark.executor.extraClassPath
Specifies the path to the directory containing the hive-site.xml on the executor nodes. It causes the hive-site.xml file to be loaded as a resource on each executor. This resource defines the Hive endpoints and security setting required by AEL to access the Hive subsystem. # AEL Spark Hive Property Settings enableHiveConnection=true spark.driver.extraClassPath=/usr/hdp/current/spark2-client/conf/ spark.executor.extraClassPath=/usr/hdp/current/spark2-client/conf/ spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://hito31-n3.cs1cloud.internal:2181,hito31-n2.cs1cloud.internal:2181,hito31-n1.cs1cloud.internal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive spark.datasource.hive.warehouse.metastoreUri=thrift://hito31-n2.cs1cloud.internal:9083 spark.datasource.hive.warehouse.load.staging.dir=/user/devuser/tmp spark.hadoop.hive.llap.daemon.service.hosts=@llap0 spark.hadoop.hive.zookeeper.quoruma=hito31-n3.cs1cloud.internal:2181,hito31-n2.cs1cloud.internal:2181,hito31-n1.cs1cloud.internal:2181
Save and close the file.
Create a symbolic link to the HWC JAR file in the
/data-integration/adaptive-execution/extra
directory. For example, if you are in theextra
directory, the following command creates this link:ln -s /usr/hdp/current/hivewarehouseconnector/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar/<user_name>/data-integration/adaptive-execution/extra/
Save and close the file.
Restart the AEL daemon.
Results
Google Cloud Storage
This configuration task is intended for Pentaho administrators and Hadoop cluster administrators who want to set up access to Google Cloud Storage (GCS) for PDI transformations running on Spark.
This task assumes that you have obtained the settings for your site's Google Cloud Storage (GCS) configuration from your Hadoop cluster administrator.
Perform the following steps to set up Hadoop cluster access to GCS:
Procedure
Log on to the cluster and stop the AEL daemon by running the shutdown script, daemon.sh stop, from the command line interface.
Download the GCS Hadoop Connector JAR file and save it in a location where you can access it. You can use the following UNIX command to download the GCS Hadoop Connector Jar:
wget https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop2-latest.jar
Use the following command to add the GCS Hadoop Connector JAR file to the
SPARK_DIST_CLASSPATH
where /full/path/to is the location where you stored the JAR file:export SPARK_DIST_CLASSPATH=$(hadoop classpath):/full/path/to/gcs-connector-hadoop2-latest.jar
Configure your clusters with the GCS connector with Hadoop/Spark using the instructions located in the Google Cloud Platform interoperability GitHub repository: https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md
Configure AEL to use the GCS Hadoop Connector. Possible ways of configuring AEL include on of the following.
- Adding the GCS properties to the /etc/hadoop/conf/core-site.xml file.
- Adding JSON keyfile parameters for GCS to the AEL daemon
application.properties file. Follow the
instructions in Step 6.NoteThe JSON keyfile for GCS must be present on all the nodes in the cluster.
(Optional) If you choose to add a JSON keyfile to the
application.properties
file, follow these steps.Navigate to the data-integration/adaptive-execution/config/ directory and open the application.properties file with any text editor.
Add the following lines of code:
spark.hadoop.google.cloud.auth.service.account.enable=true
spark.hadoop.google.cloud.auth.service.account.json.keyfile=/path/to/keyfile.json
Save the file and close it.
Restart the AEL daemon by running the startup script, daemon.sh, from the command line interface.
Google Dataproc
Perform the following steps to use the AEL engine with Hive on a GDP cluster:
Procedure
Navigate to the etc/hive/conf directory on the master node of the Pentaho instance on the GDP cluster.
Open the hive-site.xml with any text editor.
Locate the hive.execution.engine property and change the tez default value to spark.
Save and close the file.
Results
Microsoft Azure HDInsight
You must perform the following tasks before using either Windows Azure Storage Blobs (WASB) or Azure Data Lake Storage (ADLS) with AEL.
Use WASB with AEL
Procedure
Log into Ambari.
Select the ADVANCED tab under .
Edit Custom Core-site (CCs) for your instance of HDI.
Add or update the following properties under the core-site.xml section:
Property Value fs.azure.account.keyprovider.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
org.apache.hadoop.fs.azure.SimpleKeyProvider
fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
<DECRYPTED_ACCESS_KEY>
Results
Use ADLS with AEL
Procedure
Log in to Ambari.
Delete the following properties from Ambari:
Property Value fs.azure.account.auth.type
Custom
fs.azure.account.oauth.provider.type
com.microsoft.azure.storage.oauth2.CredentialServiceBasedAccessTokenProvider
fs.azure.delegation.token.provider.type
com.microsoft.azure.storage.oauth2.DelegationTokenManager
Add the following properties for accessing the storage account.
Property Value fs.azure.account.auth.type.amitsecureadlsgen2.dfs.core.windows.net
SharedKey
fs.azure.account.key.amitsecureadlsgen2.dfs.core.windows.net
<SharedKey>
Update the
fs.azure.enable.delegation.token
property tofalse
.
Results
Advanced topics
The following topics help to extend your knowledge of the Adaptive Execution Layer beyond basic setup and use:
- Spark Tuning
You can customize PDI transformation and step parameters to improve the performance of running your PDI transformations on Spark. These parameters affect memory, cores, and instances used by the Spark engine. These Spark parameters include both application parameters and Spark tuning parameters.
- See Configuring application tuning parameters for Spark to learn how to define additional Spark properties within the application.properties file or as run modification parameters within a transformation.
- See About Spark tuning in PDI to learn more about how Spark tuning parameters work in PDI.
- For the list of Spark tuning parameters available in PDI transformation steps, see Spark Tuning.
- Configuring AEL with Spark in a secure cluster
If your AEL daemon server and your cluster machines are in a secure environment like a data center, you may only want to configure a secure connection between the PDI client and the AEL daemon server.
Troubleshooting
See our list of common problems and resolutions.