Business Platform Team

Anant Corporation Blog: Our research, knowledge, thoughts, and recommendations about building and managing online business platforms.

Tag Archives: data analytics


Data Engineer’s Lunch #45: Apache Livy

In Data Engineer’s Lunch #45: Apache Livy, we discussed Apache Livy, a REST API for interacting with Spark Clusters. It also helps with submitting jobs and managing Spark Contexts and cached data. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Continue reading

Apache Cassandra Lunch #65: Spark Cassandra Connector Pushdown

In Apache Cassandra Lunch #65: Spark Cassandra Connector Pushdown, we discussed Spark predicate pushdown in the context of the Spark Cassandra connector. The live recording of Cassandra Lunch, which includes a more in-depth discussion and a demo, is embedded below in case you were not able to attend live. If you would like to attend Apache Cassandra Lunch live, it is hosted every Wednesday at 12 PM EST. Register here now!

Continue reading

Open Source Data Catalog Overview: CKAN

In this blog post, the first in a series about Open Source Data Catalogs, we will be talking about an Open Source Data Catalog known as CKAN. We will be going over what the main idea of CKAN is, what kinds of technologies make up CKAN, ways to install CKAN, and then go over installing CKAN using the package installation method along with some hurdles we ran into while doing so. Then, we will discuss a few optional features of CKAN such as its FileStore and DataStore, and then talk about ways of adding data to CKAN. Finally we conclude with some ending thoughts and conclusions on CKAN from the perspective of a short dive into it.

Continue reading

Data Engineer’s Lunch #15: Introduction to Jenkins

In Data Engineer’s Lunch #15: Introduction to Jenkins, we discussed Jenkins the automation platform. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Continue reading

Using TableAnalyzer – Anant’s Tool for Analysis of Cassandra Tables

TableAnalyzer is a tool for analyzing Cassandra (CFStats/TableStats) output that visualizes variance in metrics between nodes. We use TableAnalyzer to generate a conditionally-formatted spreadsheet that can be used to perform data model review.

Continue reading