Business Platform Team

Anant Corporation Blog: Our research, knowledge, thoughts, and recommendations about building and managing online business platforms.

Category Archives: Modern Business


Data Engineer's Lunch #6: Common Data Formats Used in Data Engineering

Data Engineer’s Lunch #6: Common Data Formats Used in Data Engineering

In Data Engineer’s Lunch #6: Common Data Formats Used in Data Engineering, we discuss common data storage formats used in data engineering. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Data Engineer’s Lunch in person, it is hosted every Monday at 12 PM EST. Register here now!

Continue reading

Apache Spark Companion Technologies: Distributed Machine Learning Frameworks

One of Apache Spark’s main core features is Spark MLLib, a library for doing machine learning in Spark. Most data science education relies on specific machine learning libraries, like Sci-Kit Learn. Having data scientists retrain to use Spark MLLib can be an extra cost on top of the data engineering work that needs to be done in the first place, just to use Spark. Databricks offers distributed versions of some of these Machine Learning frameworks as part of the Databricks platform.

Continue reading

Apache Cassandra Lunch #32: Cassandra Data Operations – Common Ways to Move Data in Cassandra

In case you missed it, this blog post is a recap of Cassandra Lunch #30, covering the basics of Cassandra Data Operations. We discuss the various ways of moving data into and out of Cassandra clusters. The live recording of Cassandra Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

Continue reading

Apache Spark Companion Technologies: Data Lakes

Data lakes are a tool for long term data storage. They can be implemented on-premises for use cases requiring high security or in the cloud for more accessible solutions. The Databricks runtime includes code specifically for easing the connection between spark and Data lake technologies as well as its own companion tech, Delta Lake. Delta Lake makes interacting with data in data lakes easier and more consistent but it is possible to work with data lakes without it, as we will see today.

Continue reading
Data Engineers lunch #5

Data Engineer’s Lunch #5: What is a Data Lake?

In Data Engineer’s Lunch #5: What is a Data Lake?, we discuss what data lakes are, why we need them, how we get data in and out, and different implementations of data lakes. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend Data Engineer’s Lunch in person, it is hosted every Monday at 12 PM EST. Register here now!

Continue reading