Apache Spark Companion Technologies: Data Lakes

Data lakes are a tool for long term data storage. They can be implemented on-premises for use cases requiring high security or in the cloud for more accessible solutions. The Databricks runtime includes code specifically for easing the connection between spark and Data lake technologies as well as its own companion tech, Delta Lake. Delta Lake makes interacting with data in data lakes easier and more consistent but it is possible to work with data lakes without it, as we will see today.

Data Engineer’s Lunch #4: Airflow for Data Engineering

In case you missed it, the fourth installment of our weekly data engineering lunch was presented by guest speaker Will Angel. It covered the topic of using Airflow for data engineering. Airflow is a scheduling tool for managing data pipelines. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Migrating from a Relational Database to Cassandra: Why, Where, When, and How

The Apache Cassandra database has gained popularity because it offers scalability and high availability without compromising performance. Many applications running today were built using relational database technology, however, this technology doesn’t offer the scalability or availability that Cassandra does. This is why many people are considering the switch to Switch to Cassandra. In this post, we will cover everything you need to know about switching from a relational database to Cassandra.

Big Data Technologies

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise, deal with data sets that are too large or complex to be dealt with by traditional software. In this post, we take a look at some of the biggest and best technologies available.

Scaling Cloud Web & Data Technologies

In today’s society, it is considered the common practice to distribute your business technology operations in an effort to maximize your potential return, whether that be tangible or intangible assets. This process is not merely separating a company into parts but also paving the path for exponential growth.

