In Apache Cassandra Lunch #67: Moving Data from Cassandra to DataStax Astra, we discussed how to move data from Open Source Cassandra to Datastax Astra using DSbulk migrator. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!
For the demo project, we will be running through some sample commands based on the following GitHub repository: https://github.com/DataStax-Examples/dsbulk-to-astra/. Some notes before getting started:
After making sure that your local Cassandra database is running, we need to set up both the keyspace and table schema for this demo. The following commands should be run on both Astra’s CQLSH console along with your local Cassandra’s CQLSH console, which defines the keyspace and tables we will use:
CREATE KEYSPACE testkeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE IF NOT EXISTS testkeyspace.video_ratings_by_user (
videoid uuid,
userid uuid,
rating int,
PRIMARY KEY (videoid, userid)
);
Some example output from local Cassandra once these commands have been run and a select * command is run:
Once this is done for both Astra and your local Cassandra database, we can proceed with using DSBulk. Before using DSBulk, it must be downloaded from the following url (which also includes instructions on downloading DSBulk): https://docs.datastax.com/en/dsbulk/doc/dsbulk/install/dsbulkInstall.html. Now we can begin running DSBulk commands.
Loading from a file at a url into local Cassandra:
./dsbulk-1.8.0/bin/dsbulk load -url https://raw.githubusercontent.com/DataStax-Examples/dsbulk-to-astra/master/data.csv -h localhost -k testkeyspace -t video_ratings_by_user -u cassandra -p cassandra
Note that the first part of the command is the path to your local DSbulk installation’s DSbulk executable file. Some sample output from the above command:
Loading from a file at a url into Astra:
./dsbulk-1.8.0/bin/dsbulk load -url https://raw.githubusercontent.com/DataStax-Examples/dsbulk-to-astra/master/data.csv -b ./secure-connect-testdb3.zip -k testkeyspace -t video_ratings_by_user -u IwxQhWdajNMpHisNlWeFlPYq -p AJ,pr7SG_H3P,,AZxWrYCqSkzUzjxXvbUrWH-c6GAII.h,YCK1S6ghAaItKCC-I0l27ybK6PuTusPbb_vJRz3igAdyvL1KepRF-tACkiMRSRx3jZW,xhBd3LgeIA,Dy2
Note that the parameters that come after -u and -p are not quite username and password, but rather the Client ID and Client Secret Key that are obtained by generating a token for your Astra database. Additionally, the path after the -b flag should point to the secure connect bundle for your Astra database.
In both of the above cases, we are loading from a .CSV file at a url into either local Cassandra or Astra. To move data from local Cassandra into Astra, we will also need to use the DSbulk unload command. We first run the following command in Astra’s cqlsh to make sure that the Astra table does not have any data in it:
TRUNCATE testkeyspace.video_ratings_by_user;
Some sample output from running that command:
Now we do a two-step process to completely move data from local Cassandra into Astra:
Step 1: Unload data from local Cassandra into a .csv file:
./dsbulk-1.8.0/bin/dsbulk unload -h localhost -k testkeyspace -t video_ratings_by_user -url ./my_data
Note that the very last parameter is the path to a local folder and it must be empty. Finally, we run the following DSbulk load command to load that local .csv file into Astra:
./dsbulk-1.8.0/bin/dsbulk load -url ./my_data.csv/ -b ./secure-connect-testdb3.zip -k testkeyspace -t video_ratings_by_user -u IwxQhWdajNMpHisNlWeFlPYq -p AJ,pr7SG_H3P,,AZxWrYCqSkzUzjxXvbUrWH-c6GAII.h,YCK1S6ghAaItKCC-I0l27ybK6PuTusPbb_vJRz3igAdyvL1KepRF-tACkiMRSRx3jZW,xhBd3LgeIA,Dy2
And the data migration process of a table from local Cassandra into Astra is complete. For a complete run-through of the commands mentioned above, along with additional commentary, please see the recorded live session below on YouTube!
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!
Subscribe to our monthly newsletter below and never miss the latest Cassandra and data engineering news!