Determining Spark resource requirements
To tune your Spark application, start by knowing your cluster size and the allowable percentage of cluster resources that a KTR can consume while running, as established by the cluster administrator. With that limitation in mind, you can maximize the number of executors, while providing each with adequate resources to do processing. As a best practice, reserve the following cluster resources when estimating the Spark application settings:
- 1 core per node.
- 1 GB RAM per node.
- 1 executor per cluster for the application manager.
- 10 percent memory overhead per executor.
Example: Calculate your Spark application settings
In the following example, your cluster size is:
- 11 nodes (1 master node and 10 worker nodes)
- 66 cores (6 cores per node)
- 110 GB RAM (10 GB per node)
In the following example, your job requirements for the allowable percentage of consumable cluster resources by a KTR include:
- Run with 50 percent YARN utilization
- 19 executors (1 driver and 18 worker nodes)
Procedure
Calculate the initial application tuning settings for the parameters.
Action Calculation Example Determine the core resources for the Spark application. Multiply the number of cluster cores by the YARN utilization percentage. 66 x 0.5 = 33 Provides 3 driver and 30 worker node cores.
Determine the memory resources available for the Spark application. Multiply the cluster RAM size by the YARN utilization percentage. 110 x 0.5 = 55 Provides 5 GB RAM for available drivers and 50 GB RAM available for worker nodes.
Discount 1 core per worker node to determine the executor core instances. Subtract the number of available worker node cores from the reserved core allocations. 30 - 10 = 20 Provides 20 executor core instances.
Discount 1 GB RAM per worker node to determine available worker node memory. Subtract the memory resources available for the worker node cores from the reserved core allocations. 50 - 10 = 40 Provides 40 GB RAM.
Allow a 10 percent memory overhead per executor. Multiply the available GB RAM by percentage available for use. (1.0 - 0.1) x 40 = 36 Provides 36 GB RAM.
Determine the Spark executor cores value. Divide the number of executor core instances by the reserved core allocations. 20 / 10 = 2 cores per node Provides 1 core per executor.
Determine the Spark executor memory value. Divide the usable memory by the reserved core allocations, then divide that amount by the number of executors. (36 / 9) / 2 = 2 GB Provides 2 GB RAM per executor.
NoteThe spark.yarn.driver.memoryOverhead and spark.driver.cores values are derived from the resources of the node that AEL is installed on, under the assumption that only the driver executor is running there. The spark.default.parallelism value is derived from the amount of parallelism per core that is required (an arbitrary setting). In the example above, a value of 36 is derived from a parallelism per core setting of 2, multiplied by the spark.executor.instances, 18.Apply the Spark application settings that you calculated, as described in Set the Spark parameters globally.
For example:Parameter Value spark.executor.instances 18 spark.yarn.executor.memoryOverhead 1024 spark.executor.memory 2G spark.yarn.driver.memoryOverhead 1024 spark.driver.memory 3G spark.executor.cores 1 spark.driver.cores 2 spark.default.parallelism 36
Results