Elassandra is an open-source project with the goal of integrating Apache Cassandra and Elasticsearch. Cassandra is a highly scalable database tool. Elasticsearch is an open-source search engine. By mapping concepts from the inner workings of one tool onto the other tool, Elassandra integrates the all-fields text search of Elasticsearch with Cassandra’s ability to easily manage large amounts of data.
Applications can interface with stored data via either Cassandra’s CQL queries or Elasticsearch’s REST API. The data structure underlying Elasticsearch’s data storage, the Lucene index, has been replaced in Elassandra by a Cassandra keyspace.
Other concepts also map from one of these tools onto the other in Elassandra. The Elasticsearch cluster, consisting of all of the shards storing data, correlates to a Cassandra virtual datacenter. One individual shard correlates to a single node in a Cassandra virtual datacenter. Cassandra tables and Elasticsearch document types both define what kinds of data can be stored. In normal Elasticsearch indices hold documents containing data of a shared type directly, but Cassandra keyspaces hold tables that hold rows containing the actual data. At lower levels, a row, which is just like any database row, maps in Elasticsearch onto the concept of a document. A single value within a document is, in Elasticsearch, called a field, and is analogous to a cell in a Cassandra row.
These changes mean data can be loaded into CQL tables via REST calls to Elasticsearch APIs. This means that applications with the ability to make REST calls but not CQL queries can still load data into CQL tables. We can also configure Elassandra to provide search results from CQL queries so applications set up to send CQL queries can get search results without also needing to be capable of sending REST calls. These features allow users of either Elasticsearch or Cassandra to gain the benefits of using the other tool without having to drastically change their applications.
Compared to Elasticsearch alone, Elassandra offers better scalability and distribution across data centers. Using Cassandra’s method of storing and accessing data allows the creation of a network of thousands of storage nodes across a number of data centers with replication to avoid losses is possible without any extra tools. Elasticsearch is not suited to distribution across multiple data centers and generally is only distributed across hundreds of nodes. Cassandra integration is able to provide a very large, highly distributed datastore without requiring additional tools.
Compared to Cassandra alone, Elassandra obviously offers search capabilities, but it also offers access to all existing Elasticsearch APIs. While Elasticsearch’s main draw is its search API, it also provides a data transform API as well as several other useful for analytics and data management.
In conclusion, Elassandra provides advantages over using either Cassandra or Elasticsearch alone, without requiring a huge amount of overhead vs either tool alone. Its real value is in enabling search over a datastore bigger or more distributed than what Elasticsearch would be able to handle alone.
We build and manage business platforms. Is your project going south? Did your vendor screw up again (that never happens)? Let’s talk for 15 minutes.
Subscribe to our monthly newsletter below and never miss the latest Cassandra and data engineering news!