Cassandra Lunch 15

Apache Cassandra Lunch #15: Cassandra Backup / Restoration

In Cassandra Lunch #15, we discuss Cassandra Backup / Restoration. We discuss disaster avoidance, disaster recovery, and different tools that can be used for backup and restoration of your Cassandra data. Also, we discuss an example scenario of how someone has set up multi-node clusters and how they go about data backup and restoration. 

Strategy

In the strategy for backup / restoration, we cover disaster avoidance, disaster recovery, and tools for cassandra backup and restoration.

Disaster Avoidance

For disaster avoidance, we discuss strategy for using multi-datacenter, multi-region, and/or multi-cloud clusters. We also discuss examples using AWS and Google, which you can see more in-depth in the video linked below.

Disaster Recovery

We discussed 3 methods of disaster recovery and a more in-depth explanation can be found in the video linked below.

  1. Cassandra Backup / Restore
    • Single node
      • Snapshot + Restore
    • Multi-node
      • Snapshot + Restore (same size cluster vs different sized cluster)
  2. Cloud Iaas Snapshot Backup / Restore
    • AWS EBS
  3. Cassandra Backup + Upload to Distributed Filesystem (S3)

Tools for Backup / Restoration

We also covered a few different tools that can be used for backup and restoration. A more in-depth discussion about those tools can be seen in the video linked below.

  • DataStax Ops Center
    • Automated data synchronization
    • Full and continuous backups
    • Seamless enterprise integration
    • Simplified upgrades
    • End-to-end performance visibility
    • Comprehensive cluster health management
  • Tablesnap
    • Uses inotify to monitor Cassandra SSTables and upload them to AWS S3
  • Cassandra Medusa
    • Medusa is an Apache Cassandra backup system.
    • Medusa is a command-line tool that offers the following features:
      • single node backup
      • single node restore
      • cluster-wide in place restore
      • cluster-wide remote restore
      • backup purge
      • support for local storage, GCS, AWS S3, and others through Apache Libcloud
      • support for clusters using single tokens or vnodes
      • full or incremental backups
    • Currently does not support
      • Cassandra deployments with multiple data folder directories
  • Cassandra-Backup
    • Backup utility and library for Apache Cassandra
    • The tool is able to perform these operations:
      • backup of SSTables
      • restore of SSTables
      • backup of commit logs
      • restore of commit logs
  • Rubrik Mosaic (Datos.io)
    • Simplifies protection and data management of MongoDB, DataStax Enterprise, and Cassandra while assuring application availability.
    • Achieve a significant storage economy with incremental forever backup and semantic deduplication.
    • Mosaic always-consistent backup speeds recovery and lets you start using the application during recovery.
    • Mosaic is cloud-native, runs on-premises, or both.
    • Mosaic reduces multiple NoSQL replicas into a single always-consistent copy and stores the backup on any cloud.

Example Scenario

We also discussed an example scenario of how someone has set up multi-node clusters and how they go about data backup and restoration. A more in-depth discussion of this example can be seen in the video linked below.

Cassandra Lunch #15

Additional Resources

ICYMI

Cassandra.Link

Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.

We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!


Join Anant's Newsletter

Subscribe to our monthly newsletter below and never miss the latest Cassandra and data engineering news!