Thursday, February 18, 2016

A Property graph over Elasticsearch?

I would like to share with you a very interesting architectural argument that is happening on my company.

It all started when Datastax acquired Titan (by thinkaurelius). Titan is a graph database that sits on top of Cassandra, Hbase or berkeley db.

Till then, we used Titan as our property graph that allowed us integrate lots of entities with a schema that kept changing. We were able to index edges and vertices properties in order to get them by filtering on one of their properties, and in addition we could travel from the vertex to its neighbors in a O(1) time.
 We didn't really need to ask graph oriented questions such as finding central vertices.

We liked titan for being scalable and even though we used Hbase, we were happy with it.

So, what have been changed ? 

We had lots of plans to use Titan in many upcoming projects, and the Datastax announcement brought us to realize that Titan will stay behind and won't get developed from now on. 
As an enterprise, that wished titan to get to its stable version and offer support, we ere very disappointed.

In the meanwhile, some guys started playing with Elasticsearch as a property graph. they even implemented a Thinkerpop API for Elasticsearch - and after some benchmarks we found out that the performances of the Elasticsearch were significantly better than the Titan performances on many use cases.

Oh, and i didn't mention that we also need some text search capabilities, that are part of Elasticsearch, but when using Titan,  it is only possible with an Elasticsearch cluster that gets its data from Titan, without having the possibility to configure the Elasticsearch index in a custom and more optimized ways.

A very interesting argument started on a the mail.

Some said that we should keep using Titan cause it is working, and use Elasticsearch only as the Titan's text search store. Datastax announced that it would be simple to move from titan to their property graph and when reaching this bridge, we would just move to Datastax DSE grpah db. we are not talking about other graph dbs cause we are not familiar with a scalable and stable graph db that is better than Titan (OrientDB for instance).

Other said that we must try Elasticsearch as our property graph, and keep develop Thinkerpop above it (mostly for being able to write in Gremlin). It is worth trying cause we had good benchmarks and it is better to use one back-end than two (Titan and Elastic).

On the other hand, there were voices that eliminated Elasticsearch and were very upset with the idea of threat it as a property graph. They said that in its nature it won't be able to support the type of indexing, masses and continues online writes that our property graph requires "Elasticsearch is not a Back End!". And in addition, why should we take care of a new thnikerpop implementation.

For summary I would say that property graph (RDF also, but that's for another talk) is a great data model for data that keep changing and help us make the data available and findable while avoiding hundreds of relational tables. 

I think that while having a working Titan property graph cluster, we should keep using it in other projects mainly because of the  experience with it. I don't think that having 2 back ends (Titan and  elastic) for 1 app is wrong, while using each back properly.

Elasticsearch has gone through a long way since its beginning. Many databases are now offering more than only "document store" or "key-value store" even if that is their nature. You can find Quora questions about [Why should I NOT use ElasticSearch as my primary datastore]. That is a good question that scientifically might be answered as a big NO but down there on the tech teams, it might work.

So, in case The Titan + Elastic won't work for us, trying Elastic is probably the next thing to do.

No comments:

Post a Comment