New Trend To Use Solr As A Data Store!
Solr, produced by Apache Software Foundation development team, is an open source enterprise search platform coded in Java. With prominent features like faceted search, hit highlighting, NoSQL features, real time indexing, full text search, etc., Apache Solr is designed for providing distributed search along with index replication for handling scalability and fault tolerance.
Widely used in almost all verticals globally, Apache Solr has been posing as a conventional enterprise search engine commanding the working of relational databases and frameworks, like Hadoop. Top tech experts have revealed that only about one fifth of Solr deployments have been witnessed, while most of these are complementing or just providing extension to other data stores.
It was in October 2012, when Solr was recognised as a potential data store with the introduction of SolrCloud that brought in flexible distributed search and indexing. One more reason why Solr is attracting enterprises is its governance model that takes care of the accuracy of records.
But since the past two years, it has been observed that 20 percent of the deployments pertained to this next generation data store – Solr. It is expected that Apache Solr has potential to leap beyond the intranet and knowledge base to provide data service across the enterprise in a rapidly growing rate. In various arenas, Solr has been effectively serving in acting as a vital data access layer for retrieving value from data and making them fully indexed and also searchable.
Being a novice concept for many, there is a lot of ambiguity in the minds of people from technical community in implementing the Apache Solr as a database. Here are a few to mention:-
1. The most common pattern of Solr that is over http, doesn’t respond well when queries are sent in batches. Since data streaming is least possible with Solr, there is no scope of going through millions of records at one go. This implies that the designing of large scale data access patterns through Solr must be undertaken with intelligent planning.
2. In comparison to highly potent RDBMS, the querying capabilities of Solr are extremely limited, despite having outstanding functions like field stats queries, which is simpler to implement.
3. Developers who are accustomed to working with relational databases will find difficulty while implementing the same DAO design patterns in the Solr model as filters are applied in queries in different manner. Thereby, a right approach must be devised for developing an application that utilises Solr for some portion of the large queries or statefull modifications.
4. Using Solr as a data store will imply complete elimination of the ‘enterprisy’ tools that facilitate better session management and statefull units, which various other highly developed web frameworks deliver.
5. Complex data and relationships are effectively dealt with by relational databases and hence it uses advanced metrics and automated analysis tools. However, in Solr, such tools have to be manually written, thus causing a drain of time and efforts.
6. On the basis of simple predicates, relational databases make use of methods that develop and optimise views and queries to combine tuples. Solr, on contrary, has no such advanced methods to join such data.
7. To offer greater availability, SolrCloud makes use of distributed file system. On the other hand, relational database is different as it flexibly makes use of slaves and masters or even RAID. So it is very important to provide the elasticity that Solr infrastructure requires to make cloud scalable.