Configuring Cassandra

Installation options, configuration regions, node settings, and more information

The default DC/OS Apache Cassandra installation provides reasonable defaults for trying out the service, but may not be sufficient for production use. You may require a different configuration depending on the context of the deployment.

Installing with Custom Configuration

The following are some examples of how to customize the installation of your Apache Cassandra instance.

In each case, you would create a new Apache Cassandra instance using the custom configuration as follows:

dcos package install cassandra --options=sample-cassandra.json

We recommend that you store your custom configuration in source control.

Installing multiple instances

By default, the Apache Cassandra service is installed with a service name of cassandra. You may specify a different name using a custom service configuration as follows:

{
  "service": {
    "name": "cassandra-other"
  }
}

When the above JSON configuration is passed to the package install cassandra command via the --options argument, the new service will use the name specified in that JSON configuration:

dcos package install cassandra --options=cassandra-other.json

Multiple instances of Apache Cassandra may be installed into your DC/OS cluster by customizing the name of each instance. For example, you might have one instance of Apache Cassandra named cassandra-staging and another named cassandra-prod, each with its own custom configuration.

After specifying a custom name for your instance, it can be reached using dcos cassandra CLI commands or directly over HTTP as described below.

WARNING: The service name cannot be changed after initial install. Changing the service name would require installing a new instance of the service against the new name, then copying over any data as necessary to the new instance.

Installing into folders

In DC/OS 1.10 and later, services may be installed into folders by specifying a slash-delimited service name. For example:

{
  "service": {
    "name": "/foldered/path/to/cassandra"
  }
}

The above example will install the service under a path of foldered => path => to => cassandra. It can then be reached using dcos cassandra CLI commands or directly over HTTP as described below.

WARNING: The service folder location cannot be changed after initial install. Changing the folder location would require installing a new instance of the service against the new name, then copying over any data as necessary to the new instance.

Addressing named instances

After you’ve installed the service under a custom name or under a folder, it may be accessed from all dcos cassandra CLI commands using the --name argument. By default, the --name value defaults to the name of the package, or cassandra.

For example, if you had an instance named cassandra-dev, the following command would invoke a pod list command against it:

dcos cassandra --name=cassandra-dev pod list

The same query would be over HTTP as follows:

curl -H "Authorization:token=$auth_token" <dcos_url>/service/cassandra-dev/v1/pod

Likewise, if you had an instance in a folder like /foldered/path/to/cassandra, the following command would invoke a pod list command against it:

dcos cassandra --name=/foldered/path/to/cassandra pod list

Similarly, it could be queried directly over HTTP as follows:

curl -H "Authorization:token=$auth_token" <dcos_url>/service/foldered/path/to/cassandra-dev/v1/pod

You may add a -v (verbose) argument to any dcos cassandra command to see the underlying HTTP queries that are being made. This can be a useful tool to see where the CLI is getting its information. In practice, dcos cassandra commands are a thin wrapper around an HTTP interface provided by the DC/OS Apache Cassandra Service itself.

Integration with DC/OS access controls

In Enterprise DC/OS, DC/OS access controls can be used to restrict access to your service. To give a non-superuser complete access to a service, grant them the following list of permissions:

dcos:adminrouter:service:marathon full
dcos:service:marathon:marathon:<service-name> full
dcos:service:adminrouter:<service-name> full
dcos:adminrouter:ops:mesos full
dcos:adminrouter:ops:slave full

Where <service-name> is your full service name, including the folder if it is installed in one.

Create a Custom Configuration File

Create a custom configuration file that will be used to install Apache Cassandra and save as config.json. Specify the service account (<service_account_id>) and a secret path (cassandra/<secret-name>).

{
  "service": {
    "service_account": "<service_account_id>",
    "service_account_secret": "cassandra/<secret-name>"
  }
}

Service Settings

Placement Constraints

Placement constraints allow you to customize where a service is deployed in the DC/OS cluster. Placement constraints use the Marathon operators syntax. For example, [["hostname", "UNIQUE"]] ensures that at most one pod instance is deployed per agent.

A common task is to specify a list of whitelisted systems to deploy to. To achieve this, use the following syntax for the placement constraint:

[["hostname", "LIKE", "10.0.0.159|10.0.1.202|10.0.3.3"]]

IMPORTANT: Be sure to include excess capacity in such a scenario so that if one of the whitelisted systems goes down, there is still enough capacity to repair your service.

Updating Placement Constraints

Clusters change, and as such so will your placement constraints. However, already running service pods will not be affected by changes in placement constraints. This is because altering a placement constraint might invalidate the current placement of a running pod, and the pod will not be relocated automatically as doing so is a destructive action. We recommend using the following procedure to update the placement constraints of a pod:

  • Update the placement constraint definition in the service.
  • For each affected pod, one at a time, perform a pod replace. This will (destructively) move the pod to be in accordance with the new placement constraints.

Zones Enterprise

Requires: DC/OS 1.11 Enterprise or later.

Placement constraints can be applied to DC/OS zones by referring to the @zone key. For example, one could spread pods across a minimum of three different zones by including this constraint:

[["@zone", "GROUP_BY", "3"]]

For the @zone constraint to be applied correctly, DC/OS must have Fault Domain Awareness enabled and configured.

WARNING: A service installed without a zone constraint cannot be updated to have one, and a service installed with a zone constraint may not have it removed.

Virtual networks

DC/OS Apache Cassandra supports deployment on virtual networks on DC/OS (including the dcos overlay network), allowing each container (task) to have its own IP address and not use port resources on the agent machines. This can be specified by passing the following configuration during installation:

{
  "service": {
    "virtual_network_enabled": true
  }
}

NOTE: Once the service is deployed on a virtual network, it cannot be updated to use the host network.

User

By default, all pods’ containers will be started as system user “nobody”. If your system configured for using over system user (for instance, you may have externally mounted persistent volumes with root’s permissions), you can define the user by defining a custom value for the service’s property “user”, for example:

{
  "service": {
    "properties": {
      "user": "root"
    }
  }
}

Regions

The service parameter region can be used to deploy the service in an alternate region. By default the service is deployed in the “local” region, which is the region the DC/OS masters are running in. To install a service in a specific reason, include in its options:

{
  "service": {
    "region": "<region>"
  }
}

WARNING: A service may not be moved between regions.

Cassandra Node Settings

Adjust the following settings to customize the amount of resources allocated to each node. DC/OS Apache Cassandra’s system requirements should be taken into consideration when adjusting these values. Reducing these values below those requirements may result in adverse performance and/or failures while using the service.

Each of the following settings can be customized under the node configuration section.

Node Count

Customize the Node Count setting (default 3) under the node configuration section. Read the Apache Cassandra documentation for minimum node count requirements.

  • In DC/OS CLI options.json: count: integer (default: 3)
  • DC/OS web interface: NODES: integer

CPU

You can customize the amount of CPU allocated to each node. A value of 1.0 is equivalent to one full dedicated CPU core on a machine, although all cores are made available via time slicing. Change this value by editing the cpus value under the node configuration section. Setting the CPU value too low will result in throttled tasks.

NOTE: each Cassandra node will use an additional 1.0 CPU for sidecar services such as backup and nodetool.

When provisioning three CPUS for each Cassandra node, the actual usage will be four CPUs, and this should be taken into account when configuring Cassandra to maximize resource utilization on an agent.

  • In DC/OS CLI options.json: cpus: number (default: 0.5)
  • DC/OS web interface: CASSANDRA_CPUS: number

Memory

You can customize the amount of RAM allocated to each node. You can change this value by editing the mem value (in MB) under the node configuration section. Setting this too low will result in out of memory errors. The heap.size setting must also be less than this value to prevent out of memory errors, which can result when the Java Virtual Machine attempts to allocate more memory than is available to the Cassandra process.

  • In DC/OS CLI options.json: mem: integer (default: 10240)
  • DC/OS web interface: CASSANDRA_MEMORY_MB: integer

JMX Port

You can customize the port that Apache Cassandra listens on for JMX requests, such as those issued by nodetool.

  • In DC/OS CLI options.json: jmx_port: integer (default: 7199)
  • DC/OS web interface: TASKCFG_ALL_JMX_PORT: integer

Storage Port

You can customize the port that Apache Cassandra listens on for inter-node communication.

  • In DC/OS CLI options.json: storage_port: integer (default: 7000)
  • DC/OS web interface: TASKCFG_ALL_CASSANDRA_STORAGE_PORT: integer

SSL Storage Port

You can customize the port that Apache Cassandra listens on for inter-node communication over SSL.

  • In DC/OS CLI options.json: ssl_storage_port: integer (default: 7001)
  • DC/OS web interface: TASKCFG_ALL_CASSANDRA_SSL_STORAGE_PORT: integer

Native Transport Port

You can customize the port that Apache Cassandra listens on for CQL queries.

  • In DC/OS CLI options.json: native_transport_port: integer (default: 9042)
  • DC/OS web interface: TASKCFG_ALL_CASSANDRA_NATIVE_TRANSPORT_PORT: integer

RPC Port

You can customize the port that Apache Cassandra listens on for Thrift RPC requests.

  • In DC/OS CLI options.json: rpc_port: integer (default: 9160)
  • DC/OS web interface: TASKCFG_ALL_CASSANDRA_RPC_PORT: integer

Native Authentication and Authorization

To use Cassandra’s native PasswordAuthenticator and CassandraAuthorizer, set the following configurations:

{
  "service": {
    "security": {
      "authentication": {
        "enabled": true,
        "superuser": {
          "name": "<superuser-name>",
          "password_secret_path": "<service-name>/password"
        }
      },
      "authorization": {
        "enabled": true
      }
    }
  }
}

This will deploy your Cassandra cluster with enhanced security protection, which includes disabling the native cassandra superuser and replacing it with the superuser that you specify.

  • NOTE: The password_secret_path is the defined path to the superuser password in the secret store.

Custom Authentication and Authorization

Support for custom authenticators and authorizers is also possible via base64 encoded YAML.

Add your custom YAML when installing Apache Cassandra. You must base64 encode your block of YAML and enter this string into the authentication_custom_cassandra_yml field.

You can do this base64 encoding as part of your automated workflow, or you can do it manually with an online converter.

For example, the custom configurations are represented as follows:

{
  "service": {
    "security": {
      "authentication": {
        "enabled": true,
        "authenticator": "com.abc.cassandra.auth.AbcAuthenticator",
        "role_manager": "com.abc.cassandra.auth.AbcRoleManager",
        "authentication_custom_cassandra_yml": "YXV0aGVudGljYXRpb25fb3B0aW9uczoKICAgIGVuYWJsZWQ6IGZhbHNlCiAgICBkZWZhdWx0X3NjaGVtZTogaW50ZXJuYWwKICAgIGFsbG93X2RpZ2VzdF93aXRoX2tlcmJlcm9zOiB0cnVlCiAgICBwbGFpbl90ZXh0X3dpdGhvdXRfc3NsOiB3YXJuCiAgICB0cmFuc2l0aW9uYWxfbW9kZTogZGlzYWJsZWQKICAgIG90aGVyX3NjaGVtZXM6CiAgICBzY2hlbWVfcGVybWlzc2lvbnM6IGZhbHNl"
      },
      "authorization": {
        "enabled": true,
        "authorizer": "com.abc.cassandra.auth.AbcAuthorizer"
      }
    }
  }
}

Disks

Volume Type

The service supports the following two volume types:

  • ROOT volumes are an isolated directory on the root volume, sharing IO/spindles with the rest of the host system.
  • MOUNT volumes are a dedicated device or partition on a separate volume with dedicated IO/spindles.

Using MOUNT volumes requires additional configuration on each DC/OS agent system, so the service currently uses ROOT volumes by default. To ensure reliable and consistent performance in a production environment, you should configure MOUNT volumes on the machines that will run the service in your cluster, and then configure the following as MOUNT volumes:

Use the following to configure the disk type:

  • In DC/OS CLI options.json: disk_type: string (default: ROOT)
  • DC/OS web interface: CASSANDRA_DISK_TYPE: string

Disk Scheduler

It is recommended that you pre-configure your storage hosts to use the deadline IO scheduler in production environments.

Zone/Rack-Aware Placement and Replication

Cassandra’s “rack”-based fault domain support is automatically enabled when specifying a placement constraint that uses the @zone key. For example, you could spread Cassandra nodes across a minimum of three different zones/racks by specifying the constraint [["@zone", "GROUP_BY", "3"]]. When a placement constraint specifying @zone is used, Cassandra nodes will be automatically configured with racks that match the names of the zones. If no placement constraint referencing @zone is configured, all nodes will be configured with a default rack of rack1.

In addition to placing the tasks on different zones/racks, the zone/rack information will be included in Cassandra’s cassandra-rackdc.properties file. This enables Cassandra to ensure data is replicated between zones/racks and not to two nodes in the same zone/rack.

Apache Cassandra Configuration

Apache Cassandra’s configuration is configurable via the cassandra section of the service schema. Read the service schema for a complete listing of available configuration.

Multi-datacenter deployment

To replicate data across data centers, Apache Cassandra requires that you configure each cluster with the addresses of the seed nodes from every remote cluster. Use the following steps to learn how to start a multi-data center Apache Cassandra deployment running inside of a single DC/OS cluster.

Launch two Cassandra clusters

  1. Launch the first cluster with the default configuration.

    dcos package install cassandra
    
  2. Create an options.json file for the second cluster that specifies a different service name and data center name.

    {
      "service": {
        "name": "cassandra2",
        "data_center": "dc2"
      }
    }
    
  3. Launch the second cluster with the following custom options.

    dcos package install cassandra --options=<options.json>
    

Get the seed node IP addresses

NOTE: You must set up a proxying layer to route traffic only if your Cassandra clusters are not on the same network.

  1. Get the list of seed node addresses for the first cluster.

    dcos cassandra --name=cassandra endpoints native-client
    

    Alternatively, you can get the list of seed node addresses from the scheduler HTTP API.

    DCOS_AUTH_TOKEN=$(dcos config show core.dcos_acs_token)
    DCOS_URL=$(dcos config show core.dcos_url)
    curl -H "authorization:token=$DCOS_AUTH_TOKEN" $DCOS_URL/service/cassandra/v1/endpoints/native-client
    

    The output should look like this:

    {
      "address": [
        "10.0.3.88:9042",
        "10.0.0.162:9042",
        "10.0.0.189:9042"
      ],
      "dns": [
        "node-0-server.cassandra.autoip.dcos.thisdcos.directory:9042",
        "node-1-server.cassandra.autoip.dcos.thisdcos.directory:9042",
        "node-2-server.cassandra.autoip.dcos.thisdcos.directory:9042"
      ]
    }
    

    Note the IPs in the address field.

  2. Run the same command for your second Cassandra cluster and note the IPs in the address field:

    dcos cassandra --name=cassandra2 endpoints native-client
    

Update configuration for both clusters

  1. Create an options2.json file with the IP addresses of the first cluster (cassandra):

    {
      "service": {
        "remote_seeds": "10.0.3.88:9042,10.0.0.162:9042,10.0.0.189:9042"
      }
    }
    
  2. Update the configuration of the second cluster.

    dcos cassandra --name=cassandra2 update start --options=options2.json
    
  3. Perform the same operation on the first cluster, creating an options.json which contains the IP addresses of the second cluster (cassandra2)'s seed nodes in the service.remote_seeds field.

  4. Then, update the first cluster’s configuration: dcos cassandra --name=cassandra update start --options=options.json.

Both schedulers will restart after each receives the configuration update, and each cluster will communicate with the seed nodes from the other cluster to establish a multi-data center topology. Repeat this process for each new cluster you add.

Monitoring your progress

You can monitor the progress of the update for the first cluster using the following command:

dcos cassandra --name=cassandra update status

You can monitor the progress of the update for the second cluster using the following command:

dcos cassandra --name=cassandra2 update status

The output should look like this:

deploy (IN_PROGRESS)
└─ node-deploy (IN_PROGRESS)
   ├─ node-0:[server] (COMPLETE)
   ├─ node-1:[server] (COMPLETE)
   └─ node-2:[server] (PREPARED)

Test your multi-datacenter configuration

Make sure to test your deployment using a Cassandra client.

Using Volume Profiles

Volume profiles are used to classify volumes. For example, users can group SSDs into a “fast” profile and group HDDs into a “slow” profile.

NOTE: Volume profiles are immutable and therefore cannot contain references to specific devices, nodes or other ephemeral identifiers.

DC/OS Storage Service (DSS) is a service that manages volumes, volume profiles, volume providers, and storage devices in a DC/OS cluster.

If you want to deploy Cassandra with DSS, please follow this tutorial

Once the DC/OS cluster is running and volume profiles are created, you can deploy Cassandra with the following configs:

cat > cassandra-options.json <<EOF
{
    "nodes": {
        "volume_profile": "cassandra",
        "disk_type": "MOUNT"
    }
}
EOF
dcos package install cassandra --options=cassandra-options.json

NOTE: Cassandra will be configured to look for MOUNT volumes with the cassandra profile.

Once the Cassandra service finishes deploying its tasks will be running with the specified volume profiles.

dcos cassandra update status
deploy (serial strategy) (COMPLETE)
└─ node-deploy (serial strategy) (COMPLETE)
   ├─ node-0:[server] (COMPLETE)
   ├─ node-0:[init_system_keyspaces] (COMPLETE)
   ├─ node-1:[server] (COMPLETE)
   └─ node-2:[server] (COMPLETE)