Business Platform Team

Anant Corporation Blog: Our research, knowledge, thoughts, and recommendations about building and managing online business platforms.

Tag Archives: spark


An Overview and Comparison of Datastax Dependencies for Cassandra, Spark and Graph

If you like wrestling with dependencies and version incompatibility issues as much as I do, then this post is for you! This post arises out of a project requiring execution of Gremlin traversals using multiple query APIs, and from both a DSE Graph Analytics cluster and an external Spark cluster at the same time. What we found is that keeping all the different available DSE libraries straight wasn’t always easy. The goal of this blog post is to summarize what we learned from the project so that others can know which library to use and why, particularly when using Spark and Graph with Cassandra. We will look at seven libraries, including DSE Java Driver, OSS Unified Java Driver, dse-java-driver-graph, OSS Spark Cassandra Connector, DSE GraphFrames, BYOS, and dse-spark-dependencies.

Continue reading
Spark, Cassandra, and Elasticsearch cover slide

Data Engineer’s Lunch #33: Using Spark, Cassandra and Elasticsearch for Data Processing

In Data Engineer’s Lunch #33: Spark Cassandra and Elasticsearch for Data Engineering, we will discuss how you can use Spark and Spark jobs to load data from a CSV file, and save + load the data into Cassandra and Elasticsearch. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Continue reading
Cover image with the title Spark Script Dependency Management

Spark Script Dependency Management

In this blog post, we will discuss a number of ways of doing dependency management when running spark scripts. This particular post is not a part of any of our ongoing series. We often discuss using spark during our Data Engineer’s Lunch events every Monday. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now! We last discussed Spark at a recent Cassandra Lunch. The topic was ETL in Cassandra with Airflow and Spark, Our most recent discussion of Spark can be found here.

Continue reading
Cover slide for the Machine Learning with Spark and Cassandra webinar

Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2

In Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra, we will discuss how you can use Apache Spark and Apache Cassandra to perform additional basic Machine Learning tasks. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

Continue reading
cassandra etl with airflow and spark

Apache Cassandra Lunch #53: Cassandra ETL with Airflow and Spark

In Apache Cassandra Lunch #53: Cassandra ETL with Airflow and Spark, we discussed how we can do Cassandra ETL processes using Airflow and Spark. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

Continue reading