Show me the data!! (Elasticsearch)

Show me the data!! (Elasticsearch)

During my time in 99.co, I was given the big opportunity to work on one of the most important and critical microservice we had, search. It was one of the most interesting technology I've touched in my 2 year ish journey of professional software development, that is Elasticsearch.

What is Elasticsearch?

It is a search engine created by Elastic. The other major competitor is Apache's solr. Interestly enough, they both run on Apache's Lucene which shines on string indexing. Anything you need that has to do with string searching, lemmatization, fuzzy search, string prefix searches and all that good jazz, Elasticsearch is pretty much a champ in that. Well, if I was crazy enough about autocompletion, I'd probably use something else as compared to Elasticsearch but I will write about autocompletion some other time.

But Elasticsearch offers much more than just string searches. It can also do geospatial searches and have pretty crazy aggregation capabilities. Also, its query language is basically a RESTful JSON HTTP request. It has a relatively okay learning curve but it can do wonders and help out with searching anything.

Although your data will be persisted by Elasticsearch, I really do not recommend you to use it as a datastore due to the lack of transactions. Your data might just disappear so consider using an actual ACID database if you require integrity. I use elasticsearch mainly as a sort of caching strategy.

What does a query looks like?

$ curl 'localhost:9200/items/_search' -H "Content-Type: Application/JSON" -d '
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "name": "Exalted orb"
                    }
                }
            ]
        }
    }
}
' | jq

That's quite a mouthful. While it looks pretty arcane if this is your first time looking at it, you will get used to it and realise that it's merely a tool to get your queries. I'm not such a big fan of learning every NOSQL database since there's no real standard that can be transferred from one database to another. Thankfully the pattern is pretty easy to understand and since so many people use it, you can always find answers in Stackoverflow or their documentation, which in my opinion, is pretty solid.

So what are the uses?

At 99.co

Okay a little selling out here, I will show the feature I wrote in the backend during my stay in 99.co. Unfortunately I'm not so talented in writing such good UI so that credit will have to go to my talented frontend engineer teammate. We provide a search for all the new launches in Singapore, that is new residential buildings that are going to be available in the market or have just launched. So here are some examples I used at work.

Aggregations in an area.
Search by radius.
Searching by drawing of a chicken scratching.

It was an honour that I was given this opportunity to write this. While I did not show about the other filters, it is pretty mundane and most databases can do this. Good thing about this is that Elasticsearch is able to return results extremely quickly. Eyeballing shows that a search request is usually fulfilled in around 30ish milliseconds, which is crazy impressive!

At my side project

PoeSearch

If you are interested in the query it's here: https://github.com/ashwinath/poe-search-discord/blob/master/src/es/index.ts#L118

There is fuzzy search included in the query. Path of Exile is developed by New Zealanders so they tend to have British spelling of words. Thankfully those words usually have 1 or a few letters difference, like colour vs color, armour vs armor etc, so in this case, fuzzy search is good enough. I wanted to experiement with lemmatisation and stemming but the machine couldn't take it and returned queries really slowly. It would have greatly improved the search experience since I wanted to guess what the user was thinking by typing into the input. Words such as none, 0, zero all have similar meanings and should be grouped together. Nonetheless, I'm still pretty happy with the outcome.

This project was mainly an exploratory attempt with Elasticsearch. Turned out to be pretty good. It was really fast at searching for items even though the machine is a pretty weak machine. I only gave ES a 600mb JVM heap and it's performance is pretty acceptable. Oh, this project has a live demo at https://poe.ashwinchat.com

Conclusion

I probably only scratched the surface of what Elasticsearch is capable of. I'm quite sure there are pretty crazy features that I probably would learn through out the course of my software engineering journey.