Our research, knowledge, thoughts, and recommendations about building and leading businesses on the Internet.
In our opinion Cassandra is one of best nosql database technologies we’ve used for high availability, large scale, and high speed business platforms, More specifically, we work with Datastax Enterprise version for Cassandra where the clients are above a certain size and need to have enterprise grade support 24/7 365 days a year with expertise around the world. There are many topics in which I could have written about as my first “Cassandra” post on our blog, but decided to write about what I call the three stooges of Cassandra data modeling: Larry (Tombstones), Curly (Data Skew), and Moe (Wide Partitions).
Rahul Singh, CEO of Anant, had the opportunity to co-organize and MC the May Meetup of Data Wranglers DC where the speakers, John Clune and Timothy Hathaway, covered two topics related to open government and public data for data processing and visualization. We had a great turnout at the event and had the chance to do some networking after.
I had the pleasure this past Wednesday of introducing Eric Pugh (@dep4b) to the Data Wranglers DC Meetup group. He spoke about using Solr and Zeppelin in data processing and; specifically, the ways big data can easily be processed and displayed as visualizations in Zeppelin. Also broached was Docker, an application Anant uses, and its role in setting up environments for data processing and analysis. Unfortunately, no actual blimps or zeppelins were seen during the talk, but the application of data analysis to events they usually fly over was presented on last month during a discussion about Spark, Kafka, and the English Premier League.
Instead of trying to completely rehash Eric’s presentation, please check out his material for yourself (available below). In short, he showed how multiple open-source tools can be used to process, import, manipulate, visualize, and share your information. More specifically, Spark is a fast data processing engine which you can use to prepare your data for presentation and analysis. Whereas, Zeppelin is a mature, enterprise-ready application; as shown by its recent graduation from Apache’s Incubator Program; and is a great tool to manipulate and visualize processed data.
Please don’t hesitate to reach out with any questions or if you are interested in participating or speaking at a future Data Wranglers DC event. Each event is recorded, livestreamed on the Data Community DC Facebook page and attended by 50 or more individuals interested in data wrangling, data processing, and possible outcomes from these efforts. After the monthly event, many members continue their discussions at a local restaurant or bar.
I hope to see you at an event in the near future!