Elastic Search

Elastic Search

1. What is Elasticsearch?

This is the first and foremost question of a newbie!. Here is the answer in a few sentences: Elasticsearch is a NoSql database and search engine build on top of Lucene. Elasticsearch provides a distributed, real-time, JSON based multi-tenant capable full text search solution. Even though the above definition is over in just two sentences, there were a lot of terms which you would not have heard before. Let us split them and explore them individually

1.a Lucene

Simply put, Lucene is a library written in Java. So what does it do and what are its capabilities are the next obvious questions!. Lucene is a search library. This means, there are functions and methods written in Java which are optimised for different search strategies. Lucene is the most popular search library ever created. Most of the open source/ commercial search implementations have the backbone of them as Lucene. So a series of questions arise after reading the above definition of Lucene, like ,if Elasticsearch is utilizing Lucene for the search part, why can’t we use bare Lucene for our purposes?. Why go for Elasticsearch? Or, what is the difference between Elasticsearch and Lucene then? The answers to this questions are, Lucene is an extremely well written library, which also makes it very hard to be dealt with when coming to customisation according to the end customer needs. So what Elasticsearch did was to build on top of Lucene an API layer which will make the using of Lucene methods and functions a very simple affair.

1.b Distributed System

Apart from the incredible difficulty in configuring bare Lucene in our applications, what makes Elasticsearch preferable over Lucene is the former’s distributed nature. Distributed, essentially means that Elasticsearch can run on different systems/nodes at the same time and try to solve a single problem harnessing the resources of the systems in the network. Lucene does not support this and is a major roadblock for many implementations.

1.c Near real-time

The documents inserted to the Elasticsearch are made available for search almost instantaneously. This capability comes ready out of the box and no external/additional configurations has to be made.

1.d JSON based

Elasticsearch uses JSON based communication. This means that it uses JSON format for the APIs and other communications. This provides a great flexibility in usage and interoperability as nowadays most of the web applications and services communicate in JSON

1.e Multi-tenant capable

Multi-tenancy refers to the architecturing of an application in which an instance of the application on a server/cloud can be accessed by multiple tenants (user groups) with varying levels of accessibility options.

2. Elasticsearch- Use cases

2.a Search

The primary use case and the aim which Elasticsearch was built is to make the “search” faster and better. So searching is the number one use case of Elasticsearch. It provides a lot of search strategies like the case dependent/independent search, partial matches, auto-suggestion searches right out of the box. Also heavy customisation of the search according to the user dependent strategies like selective weighting, highlighting etc are very easy to build and implement in Elasticsearch. This factors make it the most common choices when it comes to the search operation.

2.b Log Collection/Parsing and Analytics

Elasticsearch with the other members of the stack, like Logstash, and the Beats platform makes the data collection from a wide range of sources, a very easy and smooth process. Data forwarding from various sources are made easy with Logstash and the Beats, and due to their native integration with Elasticsearch, it is very easy to setup and start collecting the data in Elasticsearch. The problem Elasticsearch solves here is the need for different handlers of data from different sources. That is, if you are to collect the logs from different sources and need to standardise the logs, the data forwarding and the data parsing parse of this process can be easily handled with Elasticsearch’s Logstash application. Thus a lot of intermediate steps, and there by the time and effort on making a standard format can be solved with this approach. The parsed and saved data can be easily visualised by using the Elasticsearch’s visualisation tool Kibana. Many types of analytics are built in features in Elasticsearch, like different kinds of aggregations and many statistical computations, which can be applied to the logs and then make interactive visuals using Kibana to gain useful insights on the log data.

2.c Content connectors

Like with the logs as mentioned in the previous section, the next biggest use case of Elasticsearch lies within the data collection from a multitude of sources like Twitter,Sharepoint,JIVE etc. There are strong community connector plugins to extract data, with the required customisation from various sources and river it to Elasticsearch. This in-turn makes not only for a powerful data collection for specific purposes, but also makes it searchable. For eg: data from a specific hashtag can be streamed to Elasticsearch and then if we have are able to provide a lightning fast searching on this data, imagine the ease of streamlining the content which the users want. A similar implementation is being used by the Guardian news house, where in the latest comments for their news articles are streamed to Elasticsearch. This data is then analysed and made searchable, so that they can quickly find the trends of the articles as quickly as possible.

2.d Instant Visualisation

Quick data visualisation facility to create insightful dashboards within minutes of data indexing in Elasticsearch is also one of the main use cases of Elasticsearch stack. The visualisation tool that Elasticsearch provides is Kibana, which in-turn can load the data from Elasticsearch and can apply a good number of analytics on them and then render them in a wide range of graphs which can be arranged in any order to create reports/dashboards. The application process monitoring area finds huge use cases with the Kibana-Elasticsearch combination as the anomalies or threats can be detected and countered in near real time.

3. Why Elasticsearch?

Finally in to the million dollar question, why Elasticsearch should be prefferred?. Let us see the most important factors which answer this question perfectly:

3.a Scalability

One of the major advantages in using Elasticsearch is its scalability. In most use cases, for a decent search time, you just need to index the data in to Elasticsearch. Yes, that is right, no hassles or painpoints to be encountered in handling the distributed nature of Elasticsearch. Elasticsearch handles the scaling by itself. For example, if a new node is added to a cluster, we need not set the routing to it or make huge and critical settings changes to make it discoverable and functioning, the master node of Elasticsearch handles this with very less or no intervention from us.

3.b Schema less

By design, Elasticsearch is made to be a schema less application. This means we dont need to provide a schema in prior for putting documents in Elasticsearch. This is indeed a huge relief when it comes to dealing with multiple data sources. In similar NoSQL databases like MongoDb, we need to specify the schema in advance. Here in Elasticsearch we can be sigh on this part and simply start indexing the data. If there is no schema, Elasticsearch automatically assigns a schema for the document fields.

3.c Customisation

Another resounding answer to the question, why Elasticsearch?, is the customisation options it provides in the solutions it offer. For example, as mentioned in one of the previous sections, the customisation of the search options it offers to the developers can make almost all use cases of search inclusive. Also the data communication part with Elasticsearch can also be done in a wide variety of ways, ranging from default addons,plugins or user developed solution, which can be finely and gracefully integrated with it.

3.d Community

Last but not the least, the amazing community lead by Shay Banon and other equally talented developers makes it one of the robust opensource community. There are a lot of plugins, addons and libraries created by the community efforts ranging from simple analyzer plugins to data river implementations. Also the prompt responsive forums and active online presence will save a lot of developement time.

Conclusion

In this article, I have introduced Elasticsearch, the problems and issues it is attempting to solve and the compelling reason for having Elasticsearch. In the next article to the series, I will briefly introduce you to the Elasticsearch stack and what each component does.

要查看或添加评论,请登录

Darshika Srivastava的更多文章

  • CCAR ROLE

    CCAR ROLE

    What is the Opportunity? The CCAR and Capital Adequacy role will be responsible for supporting the company’s capital…

  • End User

    End User

    What Is End User? In product development, an end user (sometimes end-user)[a] is a person who ultimately uses or is…

  • METADATA

    METADATA

    WHAT IS METADATA? Often referred to as data that describes other data, metadata is structured reference data that helps…

  • SSL

    SSL

    What is SSL? SSL, or Secure Sockets Layer, is an encryption-based Internet security protocol. It was first developed by…

  • BLOATWARE

    BLOATWARE

    What is bloatware? How to identify and remove it Unwanted pre-installed software -- also known as bloatware -- has long…

  • Data Democratization

    Data Democratization

    What is Data Democratization? Unlocking the Power of Data Cultures For Businesses Data is a vital asset in today's…

  • Rooting

    Rooting

    What is Rooting? Rooting is the process by which users of Android devices can attain privileged control (known as root…

  • Data Strategy

    Data Strategy

    What is a Data Strategy? A data strategy is a long-term plan that defines the technology, processes, people, and rules…

  • Product

    Product

    What is the Definition of Product? Ask a few people that question, and their specific answers will vary, but they’ll…

  • API

    API

    What is an API? APIs are mechanisms that enable two software components to communicate with each other using a set of…

社区洞察

其他会员也浏览了