Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Configure Spark on IIoT Core with Kubernetes

Spark is a unified open-source analytics engine for large-scale data processing in clustered environments. The Spark engine can run on clusters managed by Kubernetes by making use of a native Kubernetes scheduler that has been added to the engine.

Spark is an optional component with IIoT Core.

To run Spark jobs with IIoT Core you need to build a Docker image that will include the Spark component.

Before you begin

  • Read the official Spark documentation on Running Spark on Kubernetes to understand concepts and configuration options, and learn how to create the Docker image.
  • Install IIoT Core Services before installing Spark.
  • Install the latest Spark distribution. This includes spark-submit. The integration with IIoT Core has been tested with Spark v3.3.1.
Spark application submission example

Procedure

  1. Log in to a master node as root.

  2. Grant the Spark pod access to the Kubernetes API by creating a role with the necessary permissions.

  3. Create a file called spark_sa.yaml.

    Use the following content for the spark_sa.yaml file:
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: spark
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: spark-cluster-role
    rules:
    - apiGroups: [""] # "" indicates the core API group
      resources: ["pods"]
      verbs: ["get", "watch", "list", "create", "delete"]
    - apiGroups: [""] # "" indicates the core API group
      resources: ["services"]
      verbs: ["get", "watch", "list", "create", "delete"]
    - apiGroups: [""] # "" indicates the core API group
      resources: ["configmaps"]
      verbs: ["get", "watch", "list", "create", "delete"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: spark-cluster-role-binding
    subjects:
    - kind: ServiceAccount
      name: spark
      namespace: default
    roleRef:
      kind: ClusterRole
      name: spark-cluster-role
      apiGroup: rbac.authorization.k8s.io
  4. Create a Spark service account in Kubernetes with the spark_sa.yaml file using the following command.

    kubectl apply -f spark_sa.yaml
  5. Navigate to the Spark installation directory.

  6. Run the spark-submit command.

    $ ./bin/spark-submit \
        --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
        --deploy-mode cluster \
        --name spark-pi \
        --class org.apache.spark.examples.SparkPi \
        --conf spark.executor.instances=2 \
        --conf spark.kubernetes.container.image=<spark-image> \
        --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
        local:///path/to/examples.jar  

Results

The Spark application is running in Kubernetes with IIoT Core Services.

As an example of what output you should expect, open a new terminal window to check that the Spark pods are running correctly:

kubectl get pods -w | grep spark-pi
spark-pi-42097f84c518d9cb-driver   0/1     ContainerCreating   0          2s
spark-pi-42097f84c518d9cb-driver   1/1     Running             0          2s
spark-pi-d9754384c5190191-exec-1   0/1     Pending             0          0s
spark-pi-d9754384c5190191-exec-2   0/1     Pending             0          0s
spark-pi-d9754384c5190191-exec-1   0/1     ContainerCreating   0          0s
spark-pi-d9754384c5190191-exec-2   0/1     ContainerCreating   0          0s
spark-pi-d9754384c5190191-exec-2   1/1     Running             0          2s
spark-pi-d9754384c5190191-exec-1   1/1     Running             0          2s
spark-pi-d9754384c5190191-exec-1   1/1     Terminating         0          10s
spark-pi-d9754384c5190191-exec-2   1/1     Terminating         0          10s
spark-pi-d9754384c5190191-exec-2   1/1     Terminating         0          11s
spark-pi-d9754384c5190191-exec-1   1/1     Terminating         0          11s
spark-pi-42097f84c518d9cb-driver   0/1     Completed           0          22s