On the Sofa with Upsolver's Founders
Upsolver recently hosted our first Chill Data Summit in New York City, to explore how the impact of Apache Iceberg and data lakehouse architecture is improving the way we interact with and manage the data in our data lakes.
Ravit Jain , host of The Ravit Show, joined us for the day and chatted with Upsolver’s CEO, Ori Rafael Rafael, and CTO, Yoni Eini Eini, about Apache Iceberg, why now, and how Upsolver is embracing it.
Ravit
Welcome to the Ravit show, I'm here at the Chill Data Summit and I've seen some interesting announcements that Upsolver has made and I'm pretty sure our audience would love to know about them.
First of all, Ori, what's the vibe of the Chill Data Summit, and what do you feel about being here in New York?
Ori
It's very exciting, this is the first big Upsolver event, and also Iceberg is a big change to what Upsolver is doing in these exciting times. Look at the list of speakers, everyone that's doing Iceberg is here!?
Ravit
It's an all-star lineup I would say for sure. Just for our audience, would you like to introduce yourselves, and let us know more about Upsolver and what you do at Upsolver??
Ori
I'm Ori, the CEO and Co-Founder of Upsolver. I would say I’m a data guy coming from a database background, and this is how I met Yoni.?
Yoni
I'm Yoni, the CTO and other Co-Founder of Upsolver and I'm also a data guy from a database background, but I'm more on the technical side of things.
Ravit
That brings me to another important question about the Chill Data Summit announcements that you have made today that I'm pretty sure the audience will be keen to hear, but I would love to hear more from you about the announcements.?
Ori
Sure, so we had three major announcements. First of all, Upsolver now supports Iceberg. Upsolver is an ingestion platform, so we support Iceberg as a target and, as it becomes the foundation of what we do, all data coming into Upsolver is going to be stored in Iceberg tables. You can create extra layers on that and then you can export to additional systems. Upsolver has always been built on a data lake but now it's built on a data lake that's Iceberg. I would say that is the first big change, and we have two additional announcements that are just for Iceberg, that I’ll let Yoni talk about.
Yoni
So the next thing that we're excited about is the decoupling of the data lake management services and the data ingestion services. It used to be that, when you ingested data into the data lake, then that data would not be in a proprietary format: it's Parquet and it's open, but you needed to only ingest data from Upsolver to that table. If you were to add more data to that table, then Upsolver wouldn't know about it, and things would start breaking down.
Now with the introduction of Iceberg, the ingestion layer is completely isolated, so I can ingest data into Iceberg, and it doesn't even need to be into a table that Upsolver created and it'll just work seamlessly.
On the flip side of that, Upsolver can maintain a table that we did not write the data to, or we did not create. So you can take Upsolver’s data management tools, which have existed for a very long time on top of Upsolver-managed tables, and you can apply those to any table you want. You can turn it on, you can turn it off. So having that very disconnected version of Upsolver, you can have ad-hoc table optimization, including monitoring and all the supported features - the kind that Upsolver would usually give - that is super exciting to me.?
The last thing is that when you're ingesting data, that's pretty easy because you know exactly what you need to do to get data from point A to point B. If you're doing optimization of a data lake, then what's in it for you, why are you doing this? I mean, you might experience that queries are slow or maybe your costs are getting higher, so there are a lot of reasons to get into this, but it's usually very fuzzy. And so the third announcement is that we're releasing an Open Source Service, an Iceberg diagnostics tool that you can run on any Iceberg data lake.
It runs on your computer and, in a few minutes, will give you a report of how optimized, or not optimized, your tables are, how much storage you can save, and how much time you can save per query in overhead, just because the tables are not efficiently set up. And, if you were to run Upsolver’s optimization tool, you'd get these optimized numbers, and today, you're getting these un-optimized numbers. The bigger the difference, obviously the more value you're going to get out of it. The OSS is something you can run to understand how far you are from being optimal with your data lake.
Ravit
Why now for Upsolver??
领英推荐
Yoni
Well, I think that you know Iceberg is a table format and it's extremely important to have a consolidated single table format. It's good that the industry is consolidating around that but it's only one part of the puzzle. You want to have a data lake that works, and getting to the point where the data lake works requires all of the infrastructure and then it requires maintenance so that these things come together. I think that as data lakes get bigger and more data gets ingested, and as data lakes become more, let's say, central to businesses, the maintenance matters more. Upsolver gives you that very easily, so from our perspective, we're extremely excited about Iceberg. We think that the service that we provide on top of Iceberg is necessary as data lakes scale and become more productionized.?
Ori
We should remember the vision, the idea of Iceberg is to make the data lake as easy and performant as a data warehouse. It is still not easy to do that, but Iceberg gives us the standard and I think that Upsolver can give you the implementation to deliver on that value.
Ravit
One more quick question I have for Yoni is around how can you help the community with Iceberg, what's your vision??
Yoni
So I think that Upsolver - since forever - has been very focused on streaming data at large scale, that's what we do best, and Iceberg doesn't have any inherent limitations that would prevent it working with streaming data. But it's not what Iceberg does best, it's not the reason why anyone wrote Iceberg. The people who use Iceberg today tend to be doing batch, so it's not obvious that it would work well, and I think that Upsolver as a platform that deals predominantly in streaming data ran into a lot of these limitations as we were adding support for it.?
Also, going forward, we have customers that deal with huge amounts of data and so having that experience of crossing those bridges, feeling them buckle, and reinforcing them as we go, I feel is going to be very important to trailblazing into faster and faster use cases and larger and larger use cases.?
So in that sense, we're very excited to contribute to the open source, both Iceberg and query engines to make sure that these use cases, which were not pioneering, a lot of people were doing them, but maybe pioneering on top of Iceberg are going to be much better supported.
Ori
If I can add something to that. There was a session today by Upsolver’s Chief Architect, Jason Fine, and he described the open-source contribution that Upsolver has been making lately. He showed taking a query that used to run for five and a half hours on a Presto database, now runs for 39 seconds. Same on Trino, and also how he's going to contribute a streaming API into Iceberg showing an eight-time contribution to performance.?
Yoni described it in his session today, Iceberg is a chicken and egg situation. People don't use Iceberg for streaming, it's not built for streaming, so what's going to be solved first? I hope that our contributions will make Iceberg better for streaming, both on the query engine side and on the Iceberg side.?
Ravit
I think that's a great point that you've made. For our audience, what we've done is made sure that the Chill Data Summit is also available on demand, so all the fun that we are having here, they won't be missing out! Obviously in person is more fun - that's what I always tell people!
One last question for both of you: if people want to reach out to learn more about Upsolver and Iceberg, or anything around data, which is the best platform to reach out to you?
Ori?
There is the Upsolver Slack channel, which is more for the community, but other than that, you know there are a lot of Architects at Upsolver, go talk to them and tell them about your use case. I think that's the best way to understand how we can help you.?
Yoni
I'm deeply involved in all sorts of customer use cases, so I'm available to hear about interesting use cases and questions about what will work and what won't work and things like that. You can find us on our website upsolver.com, send us an email, or book a demo.
We're a developer-heavy company so we're really technical and through all these channels you are going to reach a solutions architect. I'm very much involved in the development team, so I'm definitely going to be there!
Ravit
Thanks for sharing this and thanks for coming on The Ravit Show, it's such a pleasure to host you both, Ori and Yoni, and I'm looking forward to seeing more developments in 2024 for Upsolver. It's a great start to the year already, but looking forward to sharing more details with our community out there.
Try Upsolver for Free
If you’re new to Upsolver, why not start your free 14-day trial, or schedule your no-obligation demo with one of our in-house solutions architects who will be happy to show you around?
Founder & Host of "The Ravit Show" | LinkedIn Top Voice | Startups Advisor | Gartner Ambassador | Evangelist | Data & AI Community Builder | Influencer Marketing B2B | Marketing & Media | (Mumbai/San Francisco)
9 个月Rock and roll ???? Such great insights ??