Business Platform Team

Anant Corporation Blog: Our research, knowledge, thoughts, and recommendations about building and managing online business platforms.

Tag Archives: data analytics

Open Source Data Catalog Overview: CKAN

In this blog post, the first in a series about Open Source Data Catalogs, we will be talking about an Open Source Data Catalog known as CKAN. We will be going over what the main idea of CKAN is, what kinds of technologies make up CKAN, ways to install CKAN, and then go over installing CKAN using the package installation method along with some hurdles we ran into while doing so. Then, we will discuss a few optional features of CKAN such as its FileStore and DataStore, and then talk about ways of adding data to CKAN. Finally we conclude with some ending thoughts and conclusions on CKAN from the perspective of a short dive into it.

Continue reading

Data Engineer’s Lunch #15: Introduction to Jenkins

In Data Engineer’s Lunch #15: Introduction to Jenkins, we discussed Jenkins the automation platform. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!

Continue reading

Using TableAnalyzer – Anant’s Tool for Analysis of Cassandra Tables

TableAnalyzer is a tool for analyzing Cassandra (CFStats/TableStats) output that visualizes variance in metrics between nodes. We use TableAnalyzer to generate a conditionally-formatted spreadsheet that can be used to perform data model review.

Continue reading

Apache Spark Companion Technologies: Data Lakes

Data lakes are a tool for long term data storage. They can be implemented on-premises for use cases requiring high security or in the cloud for more accessible solutions. The Databricks runtime includes code specifically for easing the connection between spark and Data lake technologies as well as its own companion tech, Delta Lake. Delta Lake makes interacting with data in data lakes easier and more consistent but it is possible to work with data lakes without it, as we will see today.

Continue reading
Image of Laptop

Tools To Visualize Data in Cassandra / Datastax

Visualizing data helps companies interpret and analyze massive amounts of data to help them make data-driven decisions. Visualization allows for understanding trends, outliers, and patterns that may not be as obvious in just a table format. Finding the best way to visualize data in Cassandra or DataStax may be time-consuming as there are many tools, commercial and open-source, but not all of them enable connections to Cassandra and/or DataStax. Here are some of the best tools to help users visualize their data in Cassandra or DataStax.

Continue reading