登录查看更多内容

Why I joined Dremio...

Lenoy J.

发布日期: 2022年2月7日

Database Management Systems and Distributed Computing were probably the only computer science courses I really enjoyed back in college. Putting those two together, it was inevitable that I would develop a passion for Big Data technologies.

I realized early on in my career that while most companies had an enormous amount of data, we had very few people experienced in identifying/extracting the most valuable bits of information. And more importantly, we were using slow technologies that weren’t really built for big data analytics - so much so that by the time the information was curated, the time to act would have passed. So, I tailored my career in the learning and advancement of Big Data technologies. I wore many related hats over the years; be it in databases, ETL, data science and fast columnar data warehouses.

I witnessed the exciting rise and fall of Hadoop. Followed by the rapid adoption of the cloud and data lakes. Along with the cloud came cloud data warehouses. Snowflake led the charge in separating compute from storage in the cloud so that either could be scaled independently (which was always a big problem in the Hadoop world). Vertica took it to the next level by doing the same anywhere, even on-premises. And then I got a call from a friend and heard about the last remaining component in the picture -> the “data tier ” and how it should really be separated from both compute and storage. And that’s what got me digging into Dremio.

Data Lakes are meant to be ‘infinite’ (well, theoretically) but as a consequence, slow. Hence, there was always a need for data warehouses to be that fast layer in-between for any BI workload. But that brought about the annoyance of building and maintaining ETL/ELT pipelines for loading data into them, which was slow and costly. And before long you end up having multiple copies of the same data, which takes your users further and further away from the truth. And not to mention, vendor lock-ins.?

After working for one of the fastest data warehouses out there, query performance and benchmarks were paramount to me. Could Dremio on the data lake be fast enough for BI workloads? And isn’t it the same thing as external tables that most modern data warehouses have…? Well, after a quick benchmarking test, I quickly realized Dremio is like external tables on steroids. Things like data reflections, C3 caching, apache arrow (its open in-memory format) and arrow flight were things that differentiated Dremio from external tables or any other SQL engines out there; and that made it extremely performant. All that, while keeping the data where it is - in an open file format accessible to any engine, on the data lake, and without any data loading. That effectively separates data from compute…

Adam Brown Sr. 3 周前

Is an all-in-one database the future?

Secoda 4 个月前

Billion Dollar Unicorns: MarkLogic Making Waves in…

Sramana Mitra 9 年前

Things were looking good so far. I was beginning to see the value that Dremio could provide to businesses and customers. But I still wasn’t fully there yet. Aren’t data lakes meant to be immutable? What about things like transactions and ACID compliance...? And that’s when I stumbled onto the advent of open table formats (like Apache Iceberg ) that adds things like ACID compliance, schema evolution, time travel to data sitting out in Data Lakes. Now that effectively creates a “Data Lakehouse ”…

And the more I read into it; the more it looked like Data Lakehouses would eventually end up disrupting Data Warehouses. And that was huge. Really huge...

I was convinced I had to be a part of this. There was no turning back now...

Well, it’s been 6 months since I began my journey with Dremio and I can easily say it’s one of the best career moves I’ve made so far. It’s fun, fast paced and the people are amazing. The product is really cool. The culture is top notch, and our management is fully transparent and open (even to tricky equity questions!). Our Engineering and Product teams are fully motivated to building one of the best data products out there. Our field teams are customer focused, and our customers always have a say in building out the product. And finally, we’re growing rapidly, there’s tons of open roles and loads of interest. Come be a part of the next big thing in the data world…

[PS: Feel free to reach out to me if you’re curious about the benchmark, or for anything else!]

Why I joined Dremio...

Lenoy J.

领英推荐

社区洞察

其他会员也浏览了

Comparison of SQL and NoSQL: How to Choose a Data Storage System

The Database Odyssey: Evolution from Rigid Tables to Horizental, VectorDB, and Elastic Clouds

A Comprehensive Guide to MongoDB: Architecture, Operations, and Comparisons

Data Stores: Structured data VS Unstructured data

How to Choose the Right Database?

Cassandra Overview: From Zero To Hero

Microsoft Azure Data Lake

The Evolution of Data Storage: From Traditional Databases to NoSQL and Beyond

AWS and Open Source Big Data and Analytic Frameworks