Anant Corporation Blog: Our research, knowledge, thoughts, and recommendations about building and managing online business platforms.
In this blog post, the first in a series about Open Source Data Catalogs, we will be talking about an Open Source Data Catalog known as CKAN. We will be going over what the main idea of CKAN is, what kinds of technologies make up CKAN, ways to install CKAN, and then go over installing CKAN using the package installation method along with some hurdles we ran into while doing so. Then, we will discuss a few optional features of CKAN such as its FileStore and DataStore, and then talk about ways of adding data to CKAN. Finally we conclude with some ending thoughts and conclusions on CKAN from the perspective of a short dive into it.Continue reading
In Data Engineer’s Lunch #15: Introduction to Jenkins, we discussed Jenkins the automation platform. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!Continue reading
Data lakes are a tool for long term data storage. They can be implemented on-premises for use cases requiring high security or in the cloud for more accessible solutions. The Databricks runtime includes code specifically for easing the connection between spark and Data lake technologies as well as its own companion tech, Delta Lake. Delta Lake makes interacting with data in data lakes easier and more consistent but it is possible to work with data lakes without it, as we will see today.Continue reading
Visualizing data helps companies interpret and analyze massive amounts of data to help them make data-driven decisions. Visualization allows for understanding trends, outliers, and patterns that may not be as obvious in just a table format. Finding the best way to visualize data in Cassandra or DataStax may be time-consuming as there are many tools, commercial and open-source, but not all of them enable connections to Cassandra and/or DataStax. Here are some of the best tools to help users visualize their data in Cassandra or DataStax.Continue reading