This series covers different aspects of architecting and managing a global data & analytics platform. This is not as simple as choosing some technology and installing it. This work involves proper coordination of people, processes, information, and systems to ensure that the business needs are met at all times. We will cover the components of the “SMACK” stack although many people may not necessarily use Akka or Mesos, they will find much value in our coverage of Cassandra, Spark, and Kafka. We will also cover the Anant “STACK” set of procedures which we use at our company to manage data & analytics platforms for our clients.
If something is already a “platform” how can it also be a “STACK”? We’re referring to an acronym we use here at Anant to describe the components of the people and process side of a platform which is made up of the information and systems. It stands for Setup, Training, Administration, Customization / Configuration, and Knowledge Management.
As we see in our ongoing series for SMACK, the demand of the modern customer is forcing startups and large enterprises that want to be relevant to use globally scalable technologies. Lucky for them the software that powers the largest companies like Google, Facebook, and Amazon is readily available for experimentation, research, and development. Once an idea is bashed out and proven, it needs to be operationalized. Today that either means using platforms as a service that is available as subscriptions, building your own frameworks on infrastructure as a service, or a combination of commercial products, multiple cloud providers, and a managed service team. This guide is meant for data architects, engineers of large organizations or CTOs, CDOs, of growth startups that are about to grow beyond their current scale.
Even without any specific technologies being mentioned, we can see a data platform has many components and can be complex. To make a platform like this global, there is yet another layer of thought and action that goes to bring up and manage a scalable data platform. Generally speaking, it takes several people to properly deploy a scalable Global Data platform. Today it’s a little easier due to technologies and companies that have made DevOps much easier. The reason your company may be looking at this stack is that your data needs require it. In this guide, although we are only covering Apache Cassandra, we wanted to show how and why these are often used in combination.
Given that these are proven technologies if we decided to go forward and use these technologies. For each of these components, there are several things to consider. Does your team have the talent to set up & configure, train, administer, customize, and manage the knowledge for each? We need to decide potentially whether we will get managed services for these, open source versions, or commercial versions. Luckily there are commercial versions of these tools that make our life a little easier so that even if you decided to manage the platform, there will be someone to support you in your time of need.
Cassandra and Spark are supported commercially by Datastax as part of a suite called Datastax Enterprise which also includes Solr (for indexing) and an implementation of Apache Tinkerpop called DSE Graph. Datastax, as commonly known, comprises of the team of founders that took Apache Cassandra from the original source code released by Facebook and made it what it is today. Even though Datastax products are somewhat different (optimized for enterprise loads and faster by several times), they share an open core with the core open source components of Apache Cassandra, Apache Solr, and Apache Spark. The suite also has tools which are not available in the open source distributions of Apache Cassandra or Apache Spark.
Confluent publishes a commercial version of Kafka. Confluent comprises of founders who literally built Apache Kafka while at LinkedIn and now are a major part of the committers on the project. Confluent provides a very easy to use Kafka distribution which also includes enterprise security and a topic registry.
In the next article, we’ll dig deeper into the Anant STACK process to see how it could make managing your components a little easier. If you want me or our company, to come and talk to your company about Global Data & Analytics Platform, feel free to email me or my team at Anant.