History Server

Enabling the Spark History Server

DC/OS Apache Spark includes the Spark History Server. Because the history server requires HDFS, you must explicitly enable it.

Installing HDFS

NOTE: HDFS requires five private nodes.

  1. Install HDFS:

    dcos package install hdfs
    
  2. Create a history HDFS directory (default is /history). SSH into your cluster and run:

    docker run -it mesosphere/hdfs-client:2.6.4 bash
    ./bin/hdfs dfs -mkdir /history
    
  3. Create spark-history-options.json:

    {
      "service": {
        "hdfs-config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints"
      }
    }
    

Installing Spark history server

  1. To install the Spark history server:

    dcos package install spark-history --options=spark-history-options.json
    
  2. Create spark-dispatcher-options.json:

    {
      "service": {
        "spark-history-server-url": "http://<dcos_url>/service/spark-history"
      },
      "hdfs": {
        "config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints"
      }
    }
    
  3. Install the Spark dispatcher:

    dcos package install spark --options=spark-dispatcher-options.json
    
  4. Run jobs with the event log enabled:

    dcos spark run --submit-args="--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hdfs/history ... --class MySampleClass  http://external.website/mysparkapp.jar"
    

Confirm history server installation

View your job in the dispatcher at http://<dcos_url>/service/spark/. The information displayed includes a link to the history server entry for that job.