In Data Engineer’s Lunch #26, we will discuss how to use Akka Actors for concurrent data processing operations. The live recording of the Data Engineer’s Lunch, which includes a more in-depth discussion, is also embedded below in case you were not able to attend live. If you would like to attend a Data Engineer’s Lunch live, it is hosted every Monday at noon EST. Register here now!
The Actor Model was proposed by Carl Hewitt in 1973 as a way to “handle parallel processing in a high performance network”. At the time of the inception of this model, the described environment did not exist. However, as time has passed, current computers and processors have far exceeded anything Hewitt could have possibly imagined.
In Computer Science, the actor model is a mathematical model of concurrent computation that treats “actor” as the universal primitive of concurrent computation. In response to a message it receives, an actor can do a few things:
One very important thing about actors is that they may change their own private state, but can only affect other actors (indirectly) through sending messages. The actor model can also be used as a framework for modeling and understanding a wide range of concurrent systems: I.E. Email can be modeled as an actor system with accounts as actors and email addresses as actor addresses.
Akka is a free, open source toolkit and runtime for making concurrent and distributed applications on JVM. Akka’s approach to concurrency is based on the Actor model mentioned above. Akka is written in Scala but can be used with either Java or Scala (as both Java and Scala compile to JVM code). Besides the ability to easily create and interact between actors that Akka provides, there are also plenty of modules in Akka which are built to work alongside Akka’s Actor system:
One large motivation for creating Akka and a problem it resolves in the OOP model is a problem with encapsulation
With Akka Actors, an actor reacts to messages sequentially and one at a time. Since there is always at most one message being processed per actor, the invariants of an actor can be kept without synchronization. This happens automatically without locks in Akka.
Prior to Akka’s current version, Akka’s actors used to be untyped (now referred to as Classic Actors).
Akka’s current version of actors is referred to as Typed Actors. When making an actor, their behavior is typed which tells you what kinds of messages a particular actor can accept. Additionally, references to actors (known as ActorRef) are also typed. This feature helps a user send only particular types of messages to a given actor.
Although using Akka’s Typed actors is now recommended, software which currently uses classic Actors is still supported and the two types of actors can exist in the same software at the same time. This is referred to as “coexistence” in Akka documentation. More information on this can be found in the docs: https://doc.akka.io/docs/akka/current/typed/coexisting.html
Link to Github repo for the demo: https://github.com/Anant/example-akka-actors-for-data-processing
Technology used in the project:
The following diagram shows how data and messages move throughout the demo project, how the three unique actors are designed and how they interact with each other.
A brief description of each actor follows:
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!