Apache Cassandra Lunch #14: Basic Log Diagnostics with ELK/FEK/BEK

In case you missed Cassandra Lunch #14, we discussed methods for finding and diagnosing issues in Cassandra clusters. Starting off with why specialized diagnostic tools are necessary, we also discussed some outside resources that are useful in the monitoring of Cassandra clusters. We then moved on to some of Anant Corporation’s own resources for analyzing the performance of a Cassandra cluster, before diving into an example setup, demonstrating how Elastic and Kibana can make finding and diagnosing issues much easier.

Why Use Specialized Tools to Diagnose Issues?

When we work with Cassandra clusters, we are almost always working with distributed systems. In the case of using Cassandra for real-time business platforms, we can be dealing with very large distributed systems. Because the information that we may need can come from a variety of locations, we need extra tools to aggregate, process, and analyze that data if we want to build any kind of comprehensive understanding of our issues.

We need to be able to perform the monitoring and diagnosis of our Cassandra clusters easily because we want to ensure high performance. We use distributed Cassandra as part of our business platforms because we want high availability, quick writes, and quick reads. Problems in our cluster could stand in the way of getting what we want from our use of Cassandra. 

Different Types of Monitoring

We discuss the taxonomy of different types of monitoring that can be used to keep track of different portions of a business platform. Monitoring these various metrics on your system can help establish a baseline from which to distinguish problems or to plan for upgrades.

  • Endpoint Metrics (User Browser / API )
  • Logging ( Event, Error, Info, Warnings, …)
  • Tracing (Interface, Software, Database)
  • System Metrics (Disk, CPU, Memory, … )
  • Performance metrics (Throughput, Latency, …)
  • Application (Custom to Business use cases).

All of these systems can produce logs, and monitoring them all by hand can be impossible. Therefore tools are used to aggregate these logs all in one place and analyze them for patterns. Tools for doing so can be found here. Keep the tools for centralizing and logging metrics in mind for later. Tools like: Splunk, ELK (Elasticsearch, Logstash/Beats, Kibana), and ELG (same but with Grafana). 

Our Tools

Anant Corporation maintains a collection of tools that we call Cassandra.Toolkit. (found here) This project is composed of a number of tools for building, managing, and monitoring Cassandra clusters. It contains tools like Node Analyzer, which aggregates diagnostic data for a single node and compresses it into a tarball. It runs commands like nodetool status, info, and describe cluster, among others, and dumps all of that information into a single location. Table Analyzer does a similar thing for information about individual tables. Some of our most cutting edge work is only available on the dev branch, including a project to perform log analysis with Filebeat, Elasticsearch, and Kibana.

We then run through an example of this type of log analysis, details of which can be found in the slides and video below.

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!


Join Anant's Newsletter

Subscribe to our monthly newsletter below and never miss the latest Cassandra and data engineering news!