Business Platform Team

Anant Corporation Blog: Our research, knowledge, thoughts, and recommendations about building and managing online business platforms.

Tag Archives: airflow


Data Engineer’s Lunch #66: Airflow and Presto

In Data Engineer’s Lunch #66, we discuss how to connect Airflow and Presto. The live recording of Data Engineer’s Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. Subscribe to our YouTube Channel to keep up to date and watch Data Engineer’s Lunches live at 12 PM EST on Mondays!

Continue reading
Airflow and Spark: Running Spark Jobs in Airflow (Docker-based Solution)

Airflow and Spark: Running Spark Jobs on Airflow (Docker-based Solution)

In this blog post, we set up Apache Spark and Apache Airflow using a Docker container, and in the end, we ran and scheduled Spark jobs using Airflow which is deployed on a Docker container. This is very important because, with Docker images, we are able to solve problems we encountered in development. For example, problems that relate to a different environment, dependencies issues e.t.c, thereby leading to fast development, and deployment to production.

Continue reading
Airflow and Spark: Running Spark Jobs in Apache Airflow

Airflow and Spark: Running Spark Jobs in Apache Airflow

In this blog, we run and schedule Spark jobs on Apache Airflow, we built a Spark job that extracts data from an API, transforms the result JSON data, and loads the data into an S3 bucket. We have decided to run Spark and Airflow locally, and we have configured Spark and Airflow to talk together using the Airflow UI. To ensure smooth communication between Spark and S3 bucket, we have set up an S3 access point, and a dedicated AWS IAM role so that data is sent to the S3 bucket directly from our Spark application.

Continue reading
Airflow and Cassandra: Writing to Cassandra from Airflow

Airflow and Cassandra: Writing to Cassandra from Airflow

In this article, we are going to build a simple Extract, Transform, and Load (ETL) data pipeline using Apache Airflow, and Cassandra. Airflow is going to be the orchestration tool and we are going to load our data into Apache Cassandra. Apache Airflow is an open-source project that was developed at Airbnb in 2015, and Apache Cassandra was a database that was created at Facebook and was later open-sourced to the public. The focus is going to be on writing to Cassandra using the Cassandra-Airflow provider.

Continue reading

Data Engineer’s Lunch #47: Airflow on Kubernetes

In Data Engineer’s Lunch #47, we will use Kubernetes to deploy Airflow. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Continue reading

Join Anant's Newsletter

Subscribe to our monthly newsletter below and never miss the latest Cassandra and data engineering news!