Configuring DSE

Configuring DSE

The default DC/OS DataStax Enterprise installation provides reasonable defaults for trying out the service, but may not be sufficient for production use. You may require a different configuration depending on the context of the deployment.

Installing with Custom Configuration

The following are some examples of how to customize the installation of your DataStax Enterprise instance.

In each case, you would create a new DataStax Enterprise instance using the custom configuration as follows:

dcos package install datastax-dse --options=sample-datastax-dse.json

We recommend that you store your custom configuration in source control.

Installing multiple instances

By default, the DataStax Enterprise service is installed with a service name of datastax-dse. You may specify a different name using a custom service configuration as follows:

{
  "service": {
    "name": "datastax-dse-other"
  }
}

When the above JSON configuration is passed to the package install datastax-dse command via the --options argument, the new service will use the name specified in that JSON configuration:

dcos package install datastax-dse --options=datastax-dse-other.json

Multiple instances of DataStax Enterprise may be installed into your DC/OS cluster by customizing the name of each instance. For example, you might have one instance of DataStax Enterprise named datastax-dse-staging and another named datastax-dse-prod, each with its own custom configuration.

After specifying a custom name for your instance, it can be reached using dcos datastax-dse CLI commands or directly over HTTP as described below.

WARNING: The service name cannot be changed after initial install. Changing the service name would require installing a new instance of the service against the new name, then copying over any data as necessary to the new instance.

Installing into folders

In DC/OS 1.10 and later, services may be installed into folders by specifying a slash-delimited service name. For example:

{
  "service": {
    "name": "/foldered/path/to/datastax-dse"
  }
}

The above example will install the service under a path of foldered => path => to => datastax-dse. It can then be reached using dcos datastax-dse CLI commands or directly over HTTP as described below.

WARNING: The service folder location cannot be changed after initial install. Changing the folder location would require installing a new instance of the service against the new name, then copying over any data as necessary to the new instance.

Addressing named instances

After you’ve installed the service under a custom name or under a folder, it may be accessed from all dcos datastax-dse CLI commands using the --name argument. By default, the --name value defaults to the name of the package, or datastax-dse.

For example, if you had an instance named datastax-dse-dev, the following command would invoke a pod list command against it:

dcos datastax-dse --name=datastax-dse-dev pod list

The same query would be over HTTP as follows:

curl -H "Authorization:token=$auth_token" <dcos_url>/service/datastax-dse-dev/v1/pod

Likewise, if you had an instance in a folder like /foldered/path/to/datastax-dse, the following command would invoke a pod list command against it:

dcos datastax-dse --name=/foldered/path/to/datastax-dse pod list

Similarly, it could be queried directly over HTTP as follows:

curl -H "Authorization:token=$auth_token" <dcos_url>/service/foldered/path/to/datastax-dse-dev/v1/pod

You may add a -v (verbose) argument to any dcos datastax-dse command to see the underlying HTTP queries that are being made. This can be a useful tool to see where the CLI is getting its information. In practice, dcos datastax-dse commands are a thin wrapper around an HTTP interface provided by the DC/OS DataStax Enterprise Service itself.

Integration with DC/OS access controls

In Enterprise DC/OS, DC/OS access controls can be used to restrict access to your service. To give a non-superuser complete access to a service, grant them the following list of permissions:

dcos:adminrouter:service:marathon full
dcos:service:marathon:marathon:<service-name> full
dcos:service:adminrouter:<service-name> full
dcos:adminrouter:ops:mesos full
dcos:adminrouter:ops:slave full

Where <service-name> is your full service name, including the folder if it is installed in one.

Datastax Opscenter

The DC/OS Datastax Opscenter can be installed from the datastax-ops package. It is managed identically to datastax-dse. This guide primarily covers datastax-dse for conciseness. See the later sections of the guide for any configuration specifics of DC/OS Datastax Opscenter.

Service Settings

Placement Constraints

Placement constraints allow you to customize where a service is deployed in the DC/OS cluster. Placement constraints use the Marathon operators syntax. For example, [["hostname", "UNIQUE"]] ensures that at most one pod instance is deployed per agent.

A common task is to specify a list of whitelisted systems to deploy to. To achieve this, use the following syntax for the placement constraint:

[["hostname", "LIKE", "10.0.0.159|10.0.1.202|10.0.3.3"]]

IMPORTANT: Be sure to include excess capacity in such a scenario so that if one of the whitelisted systems goes down, there is still enough capacity to repair your service.

Updating Placement Constraints

Clusters change, and as such so will your placement constraints. However, already running service pods will not be affected by changes in placement constraints. This is because altering a placement constraint might invalidate the current placement of a running pod, and the pod will not be relocated automatically as doing so is a destructive action. We recommend using the following procedure to update the placement constraints of a pod:

  • Update the placement constraint definition in the service.
  • For each affected pod, one at a time, perform a pod replace. This will (destructively) move the pod to be in accordance with the new placement constraints.

Zones Enterprise

Requires: DC/OS 1.11 Enterprise or later.

Placement constraints can be applied to DC/OS zones by referring to the @zone key. For example, one could spread pods across a minimum of three different zones by including this constraint:

[["@zone", "GROUP_BY", "3"]]

For the @zone constraint to be applied correctly, DC/OS must have Fault Domain Awareness enabled and configured.

WARNING: A service installed without a zone constraint cannot be updated to have one, and a service installed with a zone constraint may not have it removed.

Virtual networks

DC/OS DataStax Enterprise supports deployment on virtual networks on DC/OS (including the dcos overlay network), allowing each container (task) to have its own IP address and not use port resources on the agent machines. This can be specified by passing the following configuration during installation:

{
  "service": {
    "virtual_network_enabled": true
  }
}

NOTE: Once the service is deployed on a virtual network, it cannot be updated to use the host network.

User

By default, all pods’ containers will be started as system user “nobody”. If your system configured for using over system user (for instance, you may have externally mounted persistent volumes with root’s permissions), you can define the user by defining a custom value for the service’s property “user”, for example:

{
  "service": {
    "properties": {
      "user": "root"
    }
  }
}

Best Practices

  • Use Mesosphere Enterprise DC/OS’s placement rules to map your cluster nodes or DC to different availability zones to achieve high resiliency.
  • Set up a routine backup service using OpsCenter to back up your business critical data on a regular basis. The data can be stored on the nodes themselves, or on AWS S3 buckets, depending on your IT policy or business needs.
  • Set up a routine repair service using OpsCenter to ensure that all data on a replica is consistent within your clusters.

Node Settings

The following settings may be adjusted to customize the amount of resources allocated to each Node. Datastax’s minimum system requirements must be taken into consideration when adjusting these values. Reducing these values may result in adverse performance and possibly even task failures.

Each of the following settings may be customized under the dsenode configuration section.

Node Count

Customize the Node Count setting (default 3). Consult the docs for minimum node requirements.

CPU

The amount of CPU allocated to each Node may be customized. A value of 1.0 equates to one full CPU core on a machine. This value may be customized by editing the cpus value under the dsenode configuration section.

Memory

The amount of RAM allocated to each Node may be customized. This value may be customized by editing the mem value (in MB) under the dsenode configuration section.

If the allocated memory is customized, you must also update the heap value under that section as well. As a rule of thumb we recommend that heap be set to half of mem. For example, for a mem value of 32000, heap should be 16000. If you do not do this, you may see restarted dse-#-node tasks due to memory errors.

Ports

Each port exposed by components may be customized via the service configuration. If you wish to install multiple instances of and have them colocate on the same machines, you must ensure that no ports are common between those instances. Customizing ports is only needed if you require multiple instances sharing a single machine. This customization is optional otherwise.

Each component’s ports may be customized in the following configuration sections:

  • Nodes (as a group): Node placement constraint under dsenode.
  • OpsCenter (if built-in instance is enabled): OpsCenter placement constraint under opscenter.

NOTE: In a multi-DC environment, all DataStax Enterprise DCs within a DSE Cluster must share the same port configuration. Co-location of DSE nodes within the same DSE Cluster is not supported.

Storage Volumes

The DC/OS service supports two volume types:

  • ROOT volumes are effectively an isolated directory on the root volume, sharing IO/spindles with the rest of the host system.
  • MOUNT volumes are a dedicated device or partition on a separate volume, with dedicated IO/spindles.

MOUNT volumes require additional configuration on each DC/OS agent system, so the service currently uses ROOT volumes by default. To ensure reliable and consistent performance in a production environment, you must configure two MOUNT volumes on the machines which will run in your cluster, and then configure the following as MOUNT volumes under dsenode:

  • Persistent data volume type = MOUNT
  • Persistent Solr volume type (if Search is enabled) = MOUNT Using ROOT volumes for these is not supported in production.

Separate volume for commit log data

If you are using non-magnetic disks, then a good approach is to keep your commit log data files on the same volume as your dse data. This is the default configuration. If you need to keep commit log data on a separate volume, you can do so. The service install provides options for enabling that feature and provisioning a separate mount point for commit log data. Just be warned that if you choose to use a separate volume, you will not be able to change it back later.

Placement Constraints

Placement constraints allow you to customize where a instance is deployed in the DC/OS cluster. Placement constraints may be configured separately for each of the node types in the following locations:

  • Nodes (as a group): Node placement constraint under dsenode.
  • OpsCenter (if built-in instance is enabled): OpsCenter placement constraint under opscenter.

Placement constraints support all Marathon operators (reference) with this syntax: field:OPERATOR[:parameter]. For example, if the reference lists [["hostname", "UNIQUE"]], you should use hostname:UNIQUE.

A common task is to specify a list of whitelisted systems to deploy to. To achieve this, use the following syntax for the placement constraint:

hostname:LIKE:10.0.0.159|10.0.1.202|10.0.3.3

You must include spare capacity in this list so that if one of the whitelisted systems goes down, there is still enough room to repair your service without that system.

Rack-Aware Placement

's “rack”-based fault domain support may be enabled by specifying a placement constraint that uses the @zone key. For example, you could spread nodes across a minimum of three different zones/racks by specifying the constraint @zone:GROUP_BY:3. When a placement constraint specifying @zone is used, nodes will be automatically configured with racks that match the names of the zones. If no placement constraint referencing @zone is configured, all nodes will be configured with a default rack of rack1.

dse.yaml and cassandra.yaml settings

Nearly all settings for dse.yaml and cassandra.yaml are exposed as configuration options, allowing them to be deployed and updated automatically by the service.

  • dse.yaml options are listed under the dse section
  • cassandra.yaml options are listed under the cassandra section

For more information on each setting, view Datastax’s documentation for dse.yaml and cassandra.yaml.

Use Built-In or External OpsCenter

DC/OS provides the datastax-ops package, which you can install to get a default OpsCenter dashboard. If you prefer to use an external OpsCenter instance, you can configure the dse service to point to an externally managed OpsCenter.

Follow these steps to configure dse to use an external opscenter (in the OpsCenter section of the dse installation screen).

  1. Check the ENABLE DATASTAX OPSCENTER checkbox.
  2. Set the OPSCENTER HOST NAME field to the hostname of your external OpsCenter instance.

If you choose to run an instance of the datastax-ops package, this field can be populated as opscenter-0-node.<service-name>.autoip.dcos.thisdcos.directory. For example opscenter-0-node.datastax-ops-1.autoip.dcos.thisdcos.directory if the datastax-ops service is named datastax-ops-1

Installation with Multi-Datacenter

Each Datacenter must be configured with the seed nodes of the other Datacenters. For example, let’s deploy three Datacenters (as separate DC/OS services in DC/OS terms), named datastax-dse-1, datastax-dse-2, and datastax-dse-3, and then link them all together. Here is an example timeline from start to finish:

NOTE: These instructions are an example and should be vetted by Datastax for the correct ordering of operations (when to add seed nodes, etc.).

DC/OS 1.10 and later

Follow these instructions for DC/OS 1.10 and later. If you are using DC/OS 1.9 or earlier, follow these instructions.

  1. Add a service from the DC/OS Universe. Deploy datastax-dse-1 with the following customizations:

    • In service, set Service Name = datastax-dse-1
    • In cluster, set Datacenter = dc_datastax_1
  2. Wait for datastax-dse-1 to finish deploying before continuing with the other DCs.

  3. Add a second service from the DC/OS Universe. Deploy datastax-dse-2 with the following customizations:

    • In service, set:
      • Service Name = datastax-dse-2
    • In cluster, set:
      • Datacenter = dc_datastax_2
      • External Seed Nodes = dse-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-1.autoip.dcos.thisdcos.directory (point datastax-dse-2 to datastax-dse-1's seeds)
    • In opscenter, set:
      • OPSCENTER HOSTNAME = opscenter-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory (point datastax-dse-2 to datastax-dse-1's OpsCenter)
  4. Add a third service from the DC/OS Universe. Deploy datastax-dse-3 with the following customizations:

    • In service, set:
      • Service Name = datastax-dse-3
    • In cluster, set:
      • Datacenter = dc_datastax_3
      • External Seed Nodes = dse-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-1.autoip.dcos.thisdcos.directory (point datastax-dse-3 to datastax-dse-1's seeds)
    • In opscenter, set:
      • OPSCENTER HOSTNAME = opscenter-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory (point datastax-dse-3 to datastax-dse-1's OpsCenter)
  5. Wait for datastax-dse-2 and datastax-dse-3 to finish deploying. Then, update the seed nodes across all the instances:

    1. Create a local file called dse-1-options.json. Paste the following into the file.

      {
        "cluster": {
          "external_seeds": "dse-0-node.datastax-dse-2.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-2.autoip.dcos.thisdcos.directory,dse-0-node.datastax-dse-3.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-3.autoip.dcos.thisdcos.directory"
        },
      }
      

      This points datastax-dse-1 to datastax-dse-2 and datastax-dse-3.

    2. From the DC/OS CLI, update the service to the new configuration.

      dcos datastax-dse update start --options=dse-1-options.json
      
    3. Wait for the seed update to roll out across datastax-dse-1 nodes.

    4. Perform the same operation for datastax-dse-2 and datastax-dse-3.

      • For datastax-dse-2, set cluster.external_seeds to dse-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-0-node.datastax-dse-3.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-3.autoip.dcos.thisdcos.directory.
      • For datastax-dse-3, set cluster.external_seeds to dse-0-node.datastax-dse-2.autoip.dcos.thisdcos.directory,dse-2-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-0-node.datastax-dse-3.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-3.autoip.dcos.thisdcos.directory.
  6. Now, each of the three DCs has seed nodes configured for the other DCs. Because we used .autoip.dcos.thisdcos.directory hostnames, which automatically update to follow the tasks, we won’t need to reconfigure seeds if they’re moved between systems in the DC/OS cluster.

DC/OS 1.9 and earlier

  1. Add a service from the DC/OS Universe. Deploy datastax-dse-1 with the following customizations:
    • In service, set Service Name = datastax-dse-1
    • In cluster, set Datacenter = dc_datastax_1
  2. Wait for datastax-dse-1 to finish deploying before continuing with the other DCs.
  3. Add a second service from the DC/OS Universe. Deploy datastax-dse-2 with the following customizations:
    • In service, set:
      • Service Name = datastax-dse-2
    • In cluster, set:
      • Datacenter = dc_datastax_2
      • External Seed Nodes = dse-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-1.autoip.dcos.thisdcos.directory (point datastax-dse-2 to datastax-dse-1's seeds)
    • In opscenter, set:
      • OPSCENTER HOSTNAME = opscenter-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory (point datastax-dse-2 to datastax-dse-1's OpsCenter)
  4. Add a third service from the DC/OS Universe. Deploy datastax-dse-3 with the following customizations:
    • In service, set:
      • Service Name = datastax-dse-3
    • In cluster, set:
      • Datacenter = dc_datastax_3
      • External Seed Nodes = dse-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-1.autoip.dcos.thisdcos.directory (point datastax-dse-3 to datastax-dse-1's seeds)
    • In opscenter, set:
      • OPSCENTER HOSTNAME = opscenter-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory (point datastax-dse-3 to datastax-dse-1's OpsCenter)
  5. Wait for datastax-dse-2 and datastax-dse-3 to finish deploying. Then, update the seed nodes across all the instances:
    1. Go to the service view of datastax-dse-1 in the DC/OS UI. Click the menu in the upper right and then choose Edit. Go to the Environment tab and set _EXTERNAL_SEEDS = dse-0-node.datastax-dse-2.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-2.autoip.dcos.thisdcos.directory,dse-0-node.datastax-dse-3.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-3.autoip.dcos.thisdcos.directory (point datastax-dse-1 to datastax-dse-2 and datastax-dse-3).
    2. Wait for the seed update to roll out across datastax-dse-1 nodes.
    3. Go to the service view of datastax-dse-2 in the DC/OS UI. Click the menu in the upper right and then choose Edit. Go to the Environment tab and update _EXTERNAL_SEEDS = dse-0-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-0-node.datastax-dse-3.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-3.autoip.dcos.thisdcos.directory (point datastax-dse-2 to datastax-dse-1 and datastax-dse-3).
    4. Wait for the seed update to roll out across datastax-dse-2 nodes.
    5. Go to the service view of datastax-dse-3 in the DC/OS UI. Click the menu in the upper right and then choose Edit. Go to the Environment tab and update _EXTERNAL_SEEDS = dse-0-node.datastax-dse-2.autoip.dcos.thisdcos.directory,dse-2-node.datastax-dse-1.autoip.dcos.thisdcos.directory,dse-0-node.datastax-dse-3.autoip.dcos.thisdcos.directory,dse-1-node.datastax-dse-3.autoip.dcos.thisdcos.directory (point datastax-dse-3 to datastax-dse-1 and datastax-dse-2).
    6. Wait for the seed update to roll out across datastax-dse-3 nodes.
  6. Now, each of the three DCs has seed nodes configured for the other DCs. Because we used .autoip.dcos.thisdcos.directory hostnames, which automatically update to follow the tasks, we won’t need to reconfigure seeds if they’re moved between systems in the DC/OS cluster.