Thus far we’ve discussed how Cassandra, Spark, Kafka, Docker, and Kubernetes can be useful to build a global data platform. These components are powerful in their own right and managing them is a little simpler if we decide to use commercial components from DataStax and Confluent.
There are other tools and services we can use to further accelerate our timeline to deliver a world-class global data and analytics platform. Although bringing up a distributed data (Cassandra), distributed computing (Spark), and distributed communication (Kafka) is a great start for a framework, it still needs a few more components to make it a “Platform” which allows quick creation and delivery of services that an enterprise can use.
In our experience, the raw components of Cassandra, Spark, and Kafka are great but require more development work to make it useable by the average developer. This is why technical leadership on such an endeavor should think about leveraging existing applications that help developers leverage the technology without needing to be experts on the technology.
Example Global Data & Analytics Platform
There are many reasons why companies would want to have their own complete platform managed on their infrastructure versus using something that a company like AWS (Dynamo, Kinesis, EMR), Google(Spanner, Google PubSub, Dataflow), PubNub, etc happens to provide. These platform as
In a company that has specific needs, this won’t fly.
In the next and final part of this series we will discuss monitoring and scaling a distributed business data & communications platform. If you want me or our company, to come and talk to your company about your Global Data & Analytics Platform, feel free to email me or my team at Anant.