Quick Start

Introduction to DC/OS Apache Spark service

This page explains how to install the DC/OS Apache Spark service.

Prerequisites

  1. Install the Spark package. This may take a few minutes. This step installs the Spark DC/OS service, Spark CLI, dispatcher, and, optionally, the history server. See the History Server section for information about how to install the history server.

    dcos package install spark
    

    Expected output:

    Installing Marathon app for package [spark] version [2.12.0-3.0.1]
    Installing CLI subcommand for package [spark] version [2.12.0-3.0.1]
    New command available: dcos spark
    DC/OS Spark is being installed!
    
    	Documentation: /mesosphere/dcos/services/spark/
    	Issues: https://docs.mesosphere.com/support/
    

    NOTE: Type dcos spark to view the Spark CLI options.

  2. Run the sample SparkPi jar for DC/OS.

    You can view the example source here.

    1. Use the following command to run a Spark job which calculates the value of Pi.

      dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.12-3.0.1.jar 30"
      

      Expected output:

      Using image 'mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2' for the driver and the executors (from dispatcher: container.docker.image).
      To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false
      Run job succeeded. Submission id: driver-20200729203326-0001
      
    2. View the standard output from your job:

      dcos spark log --completed driver-20200729203326-0001
      

      Expected output should contain:

      Pi is roughly 3.1424917141639046
      
  3. Run a Python SparkPi jar. You can view the example source here.

    1. Use the following command to run a Python Spark job which calculates the value of Pi.

      dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/pi.py 30"
      

      Expected output:

      Parsing application as Python job
      Using image 'mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2' for the driver and the executors (from dispatcher: container.docker.image).
      To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false
      Run job succeeded. Submission id: driver-20200729203704-0002
      
    2. View the standard output from your job:

      dcos spark log --completed driver-20200729203704-0002
      

      Expected output should contain:

      Pi is roughly 3.139508
      
  4. Run an R job. You can view the example source here.

    1. Use the following command to run an R job.

      dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/dataframe.R"
      

      Expected output:

      Parsing application as R job
      Using image 'mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2' for the driver and the executors (from dispatcher: container.docker.image).
      To disable this image on executors, set spark.mesos.executor.docker.forcePullImage=false
      Run job succeeded. Submission id: driver-20200729204249-0003
      
    2. Use the following command to view the standard output from your job.

      dcos spark log --completed --lines_count=10 driver-20170824224524-0003
      

      Expected output:

      In Filter(nzchar, unlist(strsplit(input, ",|\\s"))) :
        bytecode version mismatch; using eval
      root
       |-- name: string (nullable = true)
       |-- age: double (nullable = true)
      root
       |-- age: long (nullable = true)
       |-- name: string (nullable = true)
          name
      1 Justin
      

Next steps

  • To view the status of your job, run the dcos spark webui command then visit the Spark cluster dispatcher UI at http://<dcos-url>/service/spark/ .
  • To view the logs, see the documentation for Mesosphere DC/OS monitoring.
  • To view details about your Spark job, run the dcos task log --completed <submissionId> or dcos spark log --completed <submissionId> command.