Building your own Query Language - My First Career Project

Building your own Query Language - My First Career Project

Back in my first job, the requirement was to build a "Domain Specific" Query Language / Pipeline. Basically it should build any type of query on probably any type of database (general purpose database/datastore - ElasticSearch, MongoDB, PostgreSQL).

Domain Specific means having some assumptions and heuristics of the system/business before hand.

The project I build in Innovaccer was very complicated, but let's try to build a very simple generic query language, which should have these basic features -

  • User's will write the query in 1 language, system should translate to each DB specific language.
  • Should support complex queries involving AND, OR and brackets
  • Should support nested queries (bonus points)
  • Should support dynamic functions, eg date.now() functionality. (bonus points)

Please show your support by Liking ?? and Commenting ?? on this article for better reach.

The Building Blocks

As I am targeting not so nerdy software engineers, we will try to keep this as simple as possible with minimum reinvention. Let's see the basic structure -

Query Syntax - Using JSON To Build Query

The query syntax is the probably the most important and most complex part of implementing your own Query Language. Building something such as SQL or other complex specification is out of scope for us right now. Let's reuse the plain simple sweet JSON syntax to build our language. Feeling confused why using JSON?

No alt text provided for this image

Let's see the Pros and Cons..? JSON is very extensible, and can very easily help us in building Nested Structures - nested JSON objects. It also provides arrays and different data types, as well as ways to add as much information as possible to tell system exactly what to do. We will be using JSON just as the "building block" or container to structure our language easily.

Query Parser - Using DFS Tree Traversal

It isn't as complicated as the heading. Chill..!!

So, we need a small simple algorithm, which will take our input Query (in our JSON query language) and change it into a Database's own query language. Simple sweet transformation layer.

No alt text provided for this image

Underlying Database Frameworks

Rather than doing native string transformations like A equals B to the DB language "select .. where ... a=b..." which is very error prone for this blog, we will stick with ORM frameworks such as SQLAlchemy and Elasticsearch DSL in Python. Basically, we can use any framework, which supports building native queries using conjunctions and chaining -

query.where().where().where()

// or 

query.and(
  where(),
  where()
)

// or (as elasticsearch DSL in python)

filter_1 = Q(key=value)
filter_2 = Q(key=value)
final_query = Q("bool", must=[filter_1, filter_2, ...])        

As you see, there are tons of frameworks which can help us build our very simple query language, allowing up to make use of their powerful ORM.

I will be using Python, feel free to use your preferred language.

Our Query Language

So basically, our smallest filter in your where clause could look like -


{
    "operator": "eq",
    "key": "column_1",
    "value": "some_dummy_value"
}        

Simply, converting column_1 = some_value in a small filter in our query language. Similarly, we can implement other types of filters such as in, neq (not equals), contains, gte (greater than equals), lte, etc...


{
    "operator": "in",
    "key": "age",
    "value": [15, 20, 30, ...]
}        

Now, when we have our smallest atomic level of filter ready, let's see how we can add conjunction and disjunction operators. Simply saying - AND and OR operators -


{
    "operator": "AND",
    "values": [
        {
           "operator": "eq",
           "key": "city",
           "value": "Noida",
        },
        {
           "operator": "OR",
           "values": [
               ... nested filters
           ]
        }
    ]
}        

As you see above, we can nest any level of operators as we nested a whole OR filter inside AND with other leaf level filters, eg. A AND (B OR C)

Converting Query to DB Specific Query

Now, as we have our query ready, if you see, we will always have a Tree like structure..!

No alt text provided for this image

Now, we can write a very simple DFS based algorithm, which converts our Tree Query Language to DB specific query language by chaining multiple filters together --


final_query = filter(
    "AND",
    values=[
        filter("OR", values=[
            filter(B),
            filter(C),
            filter("AND", values=[...])
        ]),
        filter("OR", values=[...]),        
    ]
)        

Simply have a recursive function, where on each node, you return the query built by that subTree -- Node's Operator on (child_1, child_2, ...)

Bonus Elements

Now, when we need dynamic values for any of the node or filter in our query language, we can have a new type of node --


{
    "operator": "gte", // greater than equals
    "key": "createdAt"
    "funtion": "calculate_date_now"
}        

As you see, when we traverse and parse each node in the Tree, we can simply know if a Node is a functional node or a static node (simple filter). If it is a functional node as shown in code sample above, we can run some function and calculate the value there and then..! (For eg, datetime.now() above)

We can also have nested queries, if we tweak our dynamic nodes a little more --


{
    "operator": "eq", // greater than equals
    "key": "age"
    "funtion": "calculate_max_value",
    "args": {
        "table": "users",
        "column": "age"
    }
}        

Now, your dynamic nodes have more context supplied by args parameter, which can also be used to run Nested queries, Dynamic input based queries and so much more..! Once you run the function, the output of the function can be applied as the value of the current node. For example, in above code --> age = value(max(age) from users table)

Conclusion

We saw how to build a simple query language with simple building blocks..! Yes, in reality we had numerous other complicated stuff, writing and parsing native queries, multiple ORMs, complex operators, highly dynamic operators, plug and play support to add operators directly at runtime without redeploying codebase, etc.

Please show your support by Liking ?? and Commenting ?? on this article for better reach.

I will cover most aspects and other type of complex stuff in a later article, so to keep updated with my articles, please Subscribe to my newsletter and don't forget to share it with your friends..!!

Tushar Gupta

ASE @Canonical | ex - Samagra | MLH Fellow Fall'22 | Summer fellow'22 @Processing | GitHub Campus Expert

2 年

Great read, would love to read more about such non-conventional project ideas!

Piyush Jha

Software Engineer@Barclays | Java | Spring Boot | Python | SQL

2 年

great

Tarun Jain

Senior Manager (Global) @ AfterShip

2 年

Good read ??

要查看或添加评论,请登录

Shrey Batra的更多文章

社区洞察

其他会员也浏览了