Custom Docker Images

Customizing the Docker image in which Spark runs

You can customize the Docker image in which Spark runs by extending the standard Spark Docker image. In this way, you can install your own libraries, such as a custom Python library.

NOTE: Customizing the Spark image that Mesosphere provides is supported. However, customizations have the potential to adversely affect the integration between Spark and DC/OS. In situations where Mesosphere support suspects a customization may be adversely impacting Spark with DC/OS, Mesosphere support may request that the customer reproduce the issue with an unmodified Spark image.

To customize your Docker image:

  1. In your Dockerfile, extend from one of the standard Spark images and add your customizations:

    FROM mesosphere/spark:2.12.0-3.0.1-scala-2.12-hadoop-3.2
    RUN apt-get install -y python-pip
    RUN pip install requests
    
  2. Build an image from the customized Dockerfile.

    docker build -t username/image:tag .
    docker push username/image:tag
    
  3. Reference the custom Docker image with the --docker-image option when running a Spark job.

    dcos spark run --docker-image=myusername/myimage:v1 --submit-args="http://external.website/mysparkapp.py 30"