Why I joined Dremio...
https://techcrunch.com/2022/01/25/dremio-raises-160m-series-e-for-its-data-lake-platform/

Why I joined Dremio...

Database Management Systems and Distributed Computing were probably the only computer science courses I really enjoyed back in college. Putting those two together, it was inevitable that I would develop a passion for Big Data technologies.

I realized early on in my career that while most companies had an enormous amount of data, we had very few people experienced in identifying/extracting the most valuable bits of information. And more importantly, we were using slow technologies that weren’t really built for big data analytics - so much so that by the time the information was curated, the time to act would have passed. So, I tailored my career in the learning and advancement of Big Data technologies. I wore many related hats over the years; be it in databases, ETL, data science and fast columnar data warehouses.

I witnessed the exciting rise and fall of Hadoop. Followed by the rapid adoption of the cloud and data lakes. Along with the cloud came cloud data warehouses. Snowflake led the charge in separating compute from storage in the cloud so that either could be scaled independently (which was always a big problem in the Hadoop world). Vertica took it to the next level by doing the same anywhere, even on-premises. And then I got a call from a friend and heard about the last remaining component in the picture -> the “data tier ” and how it should really be separated from both compute and storage. And that’s what got me digging into Dremio.

Data Lakes are meant to be ‘infinite’ (well, theoretically) but as a consequence, slow. Hence, there was always a need for data warehouses to be that fast layer in-between for any BI workload. But that brought about the annoyance of building and maintaining ETL/ELT pipelines for loading data into them, which was slow and costly. And before long you end up having multiple copies of the same data, which takes your users further and further away from the truth. And not to mention, vendor lock-ins.?

After working for one of the fastest data warehouses out there, query performance and benchmarks were paramount to me. Could Dremio on the data lake be fast enough for BI workloads? And isn’t it the same thing as external tables that most modern data warehouses have…? Well, after a quick benchmarking test, I quickly realized Dremio is like external tables on steroids. Things like data reflections, C3 caching, apache arrow (its open in-memory format) and arrow flight were things that differentiated Dremio from external tables or any other SQL engines out there; and that made it extremely performant. All that, while keeping the data where it is - in an open file format accessible to any engine, on the data lake, and without any data loading. That effectively separates data from compute…

Things were looking good so far. I was beginning to see the value that Dremio could provide to businesses and customers. But I still wasn’t fully there yet. Aren’t data lakes meant to be immutable? What about things like transactions and ACID compliance...? And that’s when I stumbled onto the advent of open table formats (like Apache Iceberg ) that adds things like ACID compliance, schema evolution, time travel to data sitting out in Data Lakes. Now that effectively creates a “Data Lakehouse ”…

And the more I read into it; the more it looked like Data Lakehouses would eventually end up disrupting Data Warehouses. And that was huge. Really huge...

I was convinced I had to be a part of this. There was no turning back now...

Well, it’s been 6 months since I began my journey with Dremio and I can easily say it’s one of the best career moves I’ve made so far. It’s fun, fast paced and the people are amazing. The product is really cool. The culture is top notch, and our management is fully transparent and open (even to tricky equity questions!). Our Engineering and Product teams are fully motivated to building one of the best data products out there. Our field teams are customer focused, and our customers always have a say in building out the product. And finally, we’re growing rapidly, there’s tons of open roles and loads of interest. Come be a part of the next big thing in the data world…

[PS: Feel free to reach out to me if you’re curious about the benchmark, or for anything else!]

Arvind Kumar

Database Engineer

2 年

Everybody from vertica joining Dreamio

回复

does this compete with spark / databricks? Lenoy J.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了