Welcome to the documentation for the DC/OS Apache Spark. For more information about new and changed features, see the release notes.

Apache Spark is a fast and general-purpose cluster computing system for big data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including the following:

Spark SQL for SQL and DataFrames
MLlib for machine learning
GraphX for graph processing
Spark Streaming for stream processing.

For more information, see the Apache Spark documentation.

DC/OS Apache Spark consists of Apache Spark with a few custom commits along with DC/OS-specific packaging.

DC/OS Apache Spark includes:

Mesos Cluster Dispatcher
Spark History Server
DC/OS Apache Spark CLI
Interactive Spark shell

Benefits

Utilization: DC/OS Apache Spark leverages Mesos to run Spark on the same cluster as other DC/OS services
Improved efficiency
Simple management
Multi-team support
Interactive analytics through notebooks
UI integration
Security, including file-based and environment-based secrets

Features

Multiversion support
Run multiple Spark dispatchers
Run against multiple HDFS clusters
Backports of scheduling improvements
Simple installation of all Spark components, including the dispatcher and the history server
Integration of the dispatcher and history server
Zeppelin integration
Kerberos and SSL support

Release notes for DC/OS Apache Spark 2.11.0-2.4.6…Read More

Introduction to DC/OS Apache Spark service…Read More

Installing and customizing your DC/OS Apache Spark service…Read More

Using DC/OS Apache Spark…Read More

Integrating HDFS with DC/OS Apache Spark service…Read More

Enabling the Spark History Server…Read More

Configuring DC/OS service accounts for Spark…Read More

Upgrading DC/OS Apache Spark…Read More

Uninstalling DC/OS Apache Spark…Read More

Customizing DC/OS Apache Spark while it is up and running…Read More

Running a Spark job…Read More

Running commands interactively in the Apache Spark shell…Read More

Customizing the Docker image in which Spark runs…Read More

Understanding DC/OS Apache Spark fault tolerance…Read More

Scheduling jobs with DC/OS Apache Spark…Read More

Setting up Kerberos to run with DC/OS Apache Spark…Read More

Troubleshooting DC/OS Apache Spark…Read More

Understanding DC/OS Apache Spark version policy…Read More

Limitations of DC/OS Apache Spark…Read More

Mesosphere has scale-tested Spark on DC/OS…Read More

Spark 2.11.0-2.4.6

Documentation for DC/OS Apache Spark 2.11.0-2.4.6

Benefits

Features

Release Notes

Quick Start

Install and Customize

Usage Examples

Integration with HDFS

History Server

Security

Upgrade

Uninstalling Spark

Runtime Configuration Changes

Run a Spark Job

Interactive Spark Shell

Custom Docker Images

Fault Tolerance

Job Scheduling

Kerberos

Troubleshooting

Version Policy

Limitations

Tested Limits

Spark 2.11.0-2.4.6

Documentation for DC/OS Apache Spark 2.11.0-2.4.6

Benefits

Features

Related services

Release Notes

Quick Start

Install and Customize

Usage Examples

Integration with HDFS

History Server

Security

Upgrade

Uninstalling Spark

Runtime Configuration Changes

Run a Spark Job

Interactive Spark Shell

Custom Docker Images

Fault Tolerance

Job Scheduling

Kerberos

Troubleshooting

Version Policy

Limitations

Tested Limits