Business Platform Team

Anant Corporation Blog: Our research, knowledge, thoughts, and recommendations about building and managing online business platforms.

Tag Archives: python


Data Engineer’s Lunch #41: PygramETL

In Data Engineer’s Lunch #41: Pygrametl, we discussed PygramETL, a python ETL tool. This is the end for now of our series on them. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Continue reading
Cover image for Petl for Data Engineering presentation

Data Engineer’s Lunch #28: Petl for Data Engineering

In Data Engineer’s Lunch #28: Petl for Data Engineering, we discussed Petl as part of our ongoing series on python ETL tools. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Continue reading
Cover image with the title Spark Script Dependency Management

Spark Script Dependency Management

In this blog post, we will discuss a number of ways of doing dependency management when running spark scripts. This particular post is not a part of any of our ongoing series. We often discuss using spark during our Data Engineer’s Lunch events every Monday. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now! We last discussed Spark at a recent Cassandra Lunch. The topic was ETL in Cassandra with Airflow and Spark, Our most recent discussion of Spark can be found here.

Continue reading
Data Catalog Overview: Amundson

Open Source Data Catalog Overview: Amundsen

In this blog post, the second in a series about Open Source Data Catalogs, we will be talking about the Open Source Data Discovery and Metadata Engine known as Amundsen. We will be going over what the main idea of Amundsen is, what kinds of technologies make up Amundsen, methods of installation and development, and then go through the installation process of Amundsen using the docker method along with a few obstacles we ran into while doing so. We will also discuss the main microservices that make up Amundsen, configuration options for them, and how to add authentication to Amundsen. Finally, we conclude with some ending thoughts and conclusions on Amundsen from the perspective of a short dive into it.

Continue reading
Airflow and Spark

Data Engineer’s Lunch #25: Airflow and Spark

In Data Engineer’s Lunch #25: Airflow and Spark, we discuss how we can use Airflow to manage Spark jobs. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Continue reading