Configure Spark on IIoT Core with Kubernetes
Spark is a unified open-source analytics engine for large-scale data processing in clustered environments. The Spark engine can run on clusters managed by Kubernetes by making use of a native Kubernetes scheduler that has been added to the engine.
Spark is an optional component with IIoT Core.
To run Spark jobs with IIoT Core you need to build a Docker image that will include the Spark component.
Before you begin
- Read the official Spark documentation on Running Spark on Kubernetes to understand concepts and configuration options, and learn how to create the Docker image.
- Install IIoT Core Services before installing Spark.
- Install the latest Spark distribution. This includes
spark-submit
. The integration with IIoT Core has been tested with Spark v3.3.1.
Procedure
Log in to a master node as root.
Grant the Spark pod access to the Kubernetes API by creating a role with the necessary permissions.
Create a file called spark_sa.yaml.
Use the following content for the spark_sa.yaml file:apiVersion: v1 kind: ServiceAccount metadata: name: spark --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: spark-cluster-role rules: - apiGroups: [""] # "" indicates the core API group resources: ["pods"] verbs: ["get", "watch", "list", "create", "delete"] - apiGroups: [""] # "" indicates the core API group resources: ["services"] verbs: ["get", "watch", "list", "create", "delete"] - apiGroups: [""] # "" indicates the core API group resources: ["configmaps"] verbs: ["get", "watch", "list", "create", "delete"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: spark-cluster-role-binding subjects: - kind: ServiceAccount name: spark namespace: default roleRef: kind: ClusterRole name: spark-cluster-role apiGroup: rbac.authorization.k8s.io
Create a Spark service account in Kubernetes with the spark_sa.yaml file using the following command.
kubectl apply -f spark_sa.yaml
Navigate to the Spark installation directory.
Run the
spark-submit
command.$ ./bin/spark-submit \ --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=2 \ --conf spark.kubernetes.container.image=<spark-image> \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ local:///path/to/examples.jar
Results
As an example of what output you should expect, open a new terminal window to check that the Spark pods are running correctly:
kubectl get pods -w | grep spark-pi spark-pi-42097f84c518d9cb-driver 0/1 ContainerCreating 0 2s spark-pi-42097f84c518d9cb-driver 1/1 Running 0 2s spark-pi-d9754384c5190191-exec-1 0/1 Pending 0 0s spark-pi-d9754384c5190191-exec-2 0/1 Pending 0 0s spark-pi-d9754384c5190191-exec-1 0/1 ContainerCreating 0 0s spark-pi-d9754384c5190191-exec-2 0/1 ContainerCreating 0 0s spark-pi-d9754384c5190191-exec-2 1/1 Running 0 2s spark-pi-d9754384c5190191-exec-1 1/1 Running 0 2s spark-pi-d9754384c5190191-exec-1 1/1 Terminating 0 10s spark-pi-d9754384c5190191-exec-2 1/1 Terminating 0 10s spark-pi-d9754384c5190191-exec-2 1/1 Terminating 0 11s spark-pi-d9754384c5190191-exec-1 1/1 Terminating 0 11s spark-pi-42097f84c518d9cb-driver 0/1 Completed 0 22s