In case you missed it, this blog post is a recap of Cassandra Lunch #42, covering SSTable files. It also covers their relation to SSTableLoader. We also walk through an example using SSTableloader to load data taken from a cluster to a new, empty cluster. The live recording of Cassandra Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!
An individual SSTable is a section of on-disk storage used in Cassandra. It is also used in a number of other NoSQL databases. SSTables take the form of directories and files containing the data. They also hold other useful information to facilitate reading that data later on. SSTables are immutable once written, with new ones being added over time. More details on SSTables can be found in our previous posts here and here.
SSTableloader, also known as the Cassandra Bulk Loader is a tool for loading data from SSTables into a Cassandra cluster. Note that this is different from loading SSTables onto a Cassandra cluster. Rather than copying SSTable files, sstableloader instead streams the data contained in those files onto a Cassandra cluster. This process respects things like replication strategy and replication factor for clusters and keyspaces being loaded.
In order to work properly, the sstableloader must be given a directory containing at least the Index.db and Data.db sections of the full SSTable directory. It also works off of snapshots. The keyspace and table for data to be streamed into must already exist, but the table can already have other data in it.
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra, but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!
Subscribe to our monthly newsletter below and never miss the latest Cassandra and data engineering news!