Any Cassandra deployment being worked with on a long term basis needs some sort of monitoring solution. Monitoring is extremely useful in the process of diagnosing problems with a cluster. It also helps ensure optimal performance in production systems. There are many ways of monitoring Cassandra clusters, but they can broadly be split into two groups. The first group relies on Cassandra’s JMX integration since Cassandra is a Java application based on JVM. The other group is based on the analysis of Cassandra logs. We have a previous discussion of log-based monitoring that can be found here.
DSE OpsCenter is a utility provided by Datastax that is useful for both the monitoring and administration of DSE Cassandra clusters. Older versions of the tool were also useful for monitoring non-DSE Cassandra clusters as well. OpsCenters node creation and configuration tools mostly fit under the OpsCenter LifeCycle Manager. OpsCenter can help with managing the creation, configuration, and security settings of a Cassandra cluster. In this post, we will focus on the other features that OpsCenter offers. Aside from the monitoring dashboard, we will also look at alert functionality as well as historical metrics.
OpsCenter’s monitoring dashboard provides access to pretty much all standard JVM metrics, but also provides Cassandra specific metrics based on specific Cassandra domains. You can get stats at the table, keyspace, or cluster level, as well as metrics on things like storage and communication. We can add the most important metrics to the dashboard in order to generate a useful view of the status of our cluster. Metrics to focus on include node status and health, and read/write statistics like request rate, request latency, and timeouts or failures. Working with Java-specific metrics like garbage collection stats and table level metrics like partition size can also be important for Cassandra monitoring.
OpsCenter’s alert functionality triggers based on event logs of a certain level. The log levels that can show up in the event log are: debug, info, warn, error, critical, and alert. What events are logged at the alert level is configurable to specific events and also at the cluster level. OpsCenter alerts can trigger action to bring their info to the attention of the end-user. We can use this functionality to trigger emails and alerts in Enterprise monitoring solutions. It can also tie in with existing APIs or apps with webhooks by sending POST requests with alert details included.
OpsCenter stores its collected metrics data in a Cassandra Keyspace. By default, this keyspace is in the same cluster that OpsCenter monitors. This would mean that as metrics are collected, space on the cluster gets filled up. Because of this, metrics data expires and gets deleted after some time. How long this takes is configurable, but may not be the best way to keep metrics data long term. OpsCenter by default excludes its own keyspace and the Cassandra system keyspace, though it is possible to add other keyspaces to the exclusion list and the default keyspaces can also be removed.
OpsCenter can also export metrics data to other monitoring solutions for long term storage. The metrics data can also be downloaded as a tarball via OpsCenter.
OpsCenter is one of a number of monitoring solutions. When working with DSE clusters, OpsCenter is a powerful monitoring and administration tool. Its monitoring side provides real-time access to both a variety of Java application specific metrics as well as many Cassandra specific ones.
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!