Quick Start

Introduction to DC/OS Apache Spark service

This page explains how to install the DC/OS Apache Spark service.

Prerequisites

Security mode Service account
Disabled Not available
Permissive Optional
Strict Required
  1. Install the Spark package. This may take a few minutes. This step installs the Spark DC/OS service, Spark CLI, dispatcher, and, optionally, the history server. See the History Server section for information about how to install the history server.

    dcos package install spark
    

    Expected output:

    Installing Marathon app for package [spark] version [2.6.0-2.3.2]
    Installing CLI subcommand for package [spark] version [2.6.0-2.3.2]
    New command available: dcos spark
    DC/OS Spark is being installed!
    
    Documentation: /mesosphere/dcos/services/spark/
     Issues: https://docs.mesosphere.com/support/
    
<p class="message--note"><strong>NOTES: </strong> Type `dcos spark` to view the Spark CLI options. You can install the Spark CLI using `dcos package install spark --cli`</p>
  1. Run the sample SparkPi jar for DC/OS. You can view the example source here.

    1. Use the following command to run a Spark job which calculates the value of Pi.

      dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi https://downloads.mesosphere.com/spark/assets/spark-examples_2.11-2.0.1.jar 30"
      

      Expected output:

      2017/08/24 15:42:07 Using docker image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for drivers
      2017/08/24 15:42:07 Pulling image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for executors, by default. To bypass set spark.mesos.executor.docker.forcePullImage=false
      2017/08/24 15:42:07 Setting DCOS_SPACE to /spark
      Run job succeeded. Submission id: driver-20170824224209-0001
      
    2. View the standard output from your job:

      dcos spark log driver-20170824224209-0001
      

      Expected output:

      Pi is roughly 3.141853333333333
      
  2. Run a Python SparkPi jar. You can view the example source here.

    1. Use the following command to run a Python Spark job which calculates the value of Pi.

      dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/pi.py 30"
      

      Expected output:

      2017/08/24 15:44:20 Parsing application as Python job
      2017/08/24 15:44:23 Using docker image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for drivers
      2017/08/24 15:44:23 Pulling image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for executors, by default. To bypass set spark.mesos.executor.docker.forcePullImage=false
      2017/08/24 15:44:23 Setting DCOS_SPACE to /spark
      Run job succeeded. Submission id: driver-20170824224423-0002
      
    2. View the standard output from your job:

      dcos task log --completed driver-20170616213917-0002
      

      Expected output:

      Pi is roughly 3.142715
      
  3. Run an R job. You can view the example source here.

    1. Use the following command to run an R job.

      dcos spark run --submit-args="https://downloads.mesosphere.com/spark/examples/dataframe.R"
      

      Expected output:

      2017/08/24 15:45:21 Parsing application as R job
      2017/08/24 15:45:23 Using docker image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for drivers
      2017/08/24 15:45:23 Pulling image mesosphere/spark:2.6.0-2.3.2-hadoop-2.7 for executors, by default. To bypass set spark.mesos.executor.docker.forcePullImage=false
      2017/08/24 15:45:23 Setting DCOS_SPACE to /spark
      Run job succeeded. Submission id: driver-20170824224524-0003
      
    2. Use the following command to view the standard output from your job.

      dcos spark log --lines_count=10 driver-20170824224524-0003
      

      Expected output:

      In Filter(nzchar, unlist(strsplit(input, ",|\\s"))) :
        bytecode version mismatch; using eval
      root
       |-- name: string (nullable = true)
       |-- age: double (nullable = true)
      root
       |-- age: long (nullable = true)
       |-- name: string (nullable = true)
          name
      1 Justin        
      

Next steps

  • To view the status of your job, run the dcos spark webui command then visit the Spark cluster dispatcher UI at http://<dcos-url>/service/spark/ .
  • To view the logs, see the documentation for Mesosphere DC/OS monitoring.
  • To view details about your Spark job, run the dcos task log --completed <submissionId> command.