Building your own Query Language - My First Career Project
Shrey Batra
CEO @ Cosmocloud | Ex-LinkedIn | Angel Investor | MongoDB Champion | Book Author | Patent Holder (Distributed Algorithms)
Back in my first job, the requirement was to build a "Domain Specific" Query Language / Pipeline. Basically it should build any type of query on probably any type of database (general purpose database/datastore - ElasticSearch, MongoDB, PostgreSQL).
Domain Specific means having some assumptions and heuristics of the system/business before hand.
The project I build in Innovaccer was very complicated, but let's try to build a very simple generic query language, which should have these basic features -
Please show your support by Liking ?? and Commenting ?? on this article for better reach.
The Building Blocks
As I am targeting not so nerdy software engineers, we will try to keep this as simple as possible with minimum reinvention. Let's see the basic structure -
Query Syntax - Using JSON To Build Query
The query syntax is the probably the most important and most complex part of implementing your own Query Language. Building something such as SQL or other complex specification is out of scope for us right now. Let's reuse the plain simple sweet JSON syntax to build our language. Feeling confused why using JSON?
Let's see the Pros and Cons..? JSON is very extensible, and can very easily help us in building Nested Structures - nested JSON objects. It also provides arrays and different data types, as well as ways to add as much information as possible to tell system exactly what to do. We will be using JSON just as the "building block" or container to structure our language easily.
Query Parser - Using DFS Tree Traversal
It isn't as complicated as the heading. Chill..!!
So, we need a small simple algorithm, which will take our input Query (in our JSON query language) and change it into a Database's own query language. Simple sweet transformation layer.
Underlying Database Frameworks
Rather than doing native string transformations like A equals B to the DB language "select .. where ... a=b..." which is very error prone for this blog, we will stick with ORM frameworks such as SQLAlchemy and Elasticsearch DSL in Python. Basically, we can use any framework, which supports building native queries using conjunctions and chaining -
query.where().where().where()
// or
query.and(
where(),
where()
)
// or (as elasticsearch DSL in python)
filter_1 = Q(key=value)
filter_2 = Q(key=value)
final_query = Q("bool", must=[filter_1, filter_2, ...])
As you see, there are tons of frameworks which can help us build our very simple query language, allowing up to make use of their powerful ORM.
I will be using Python, feel free to use your preferred language.
Our Query Language
So basically, our smallest filter in your where clause could look like -
领英推荐
{
"operator": "eq",
"key": "column_1",
"value": "some_dummy_value"
}
Simply, converting column_1 = some_value in a small filter in our query language. Similarly, we can implement other types of filters such as in, neq (not equals), contains, gte (greater than equals), lte, etc...
{
"operator": "in",
"key": "age",
"value": [15, 20, 30, ...]
}
Now, when we have our smallest atomic level of filter ready, let's see how we can add conjunction and disjunction operators. Simply saying - AND and OR operators -
{
"operator": "AND",
"values": [
{
"operator": "eq",
"key": "city",
"value": "Noida",
},
{
"operator": "OR",
"values": [
... nested filters
]
}
]
}
As you see above, we can nest any level of operators as we nested a whole OR filter inside AND with other leaf level filters, eg. A AND (B OR C)
Converting Query to DB Specific Query
Now, as we have our query ready, if you see, we will always have a Tree like structure..!
Now, we can write a very simple DFS based algorithm, which converts our Tree Query Language to DB specific query language by chaining multiple filters together --
final_query = filter(
"AND",
values=[
filter("OR", values=[
filter(B),
filter(C),
filter("AND", values=[...])
]),
filter("OR", values=[...]),
]
)
Simply have a recursive function, where on each node, you return the query built by that subTree -- Node's Operator on (child_1, child_2, ...)
Bonus Elements
Now, when we need dynamic values for any of the node or filter in our query language, we can have a new type of node --
{
"operator": "gte", // greater than equals
"key": "createdAt"
"funtion": "calculate_date_now"
}
As you see, when we traverse and parse each node in the Tree, we can simply know if a Node is a functional node or a static node (simple filter). If it is a functional node as shown in code sample above, we can run some function and calculate the value there and then..! (For eg, datetime.now() above)
We can also have nested queries, if we tweak our dynamic nodes a little more --
{
"operator": "eq", // greater than equals
"key": "age"
"funtion": "calculate_max_value",
"args": {
"table": "users",
"column": "age"
}
}
Now, your dynamic nodes have more context supplied by args parameter, which can also be used to run Nested queries, Dynamic input based queries and so much more..! Once you run the function, the output of the function can be applied as the value of the current node. For example, in above code --> age = value(max(age) from users table)
Conclusion
We saw how to build a simple query language with simple building blocks..! Yes, in reality we had numerous other complicated stuff, writing and parsing native queries, multiple ORMs, complex operators, highly dynamic operators, plug and play support to add operators directly at runtime without redeploying codebase, etc.
Please show your support by Liking ?? and Commenting ?? on this article for better reach.
I will cover most aspects and other type of complex stuff in a later article, so to keep updated with my articles, please Subscribe to my newsletter and don't forget to share it with your friends..!!
ASE @Canonical | ex - Samagra | MLH Fellow Fall'22 | Summer fellow'22 @Processing | GitHub Campus Expert
2 年Great read, would love to read more about such non-conventional project ideas!
Software Engineer@Barclays | Java | Spring Boot | Python | SQL
2 年great
Senior Manager (Global) @ AfterShip
2 年Good read ??