What the Heck is Puppygraph?

What the Heck is Puppygraph?

Introduction

What the heck is PuppyGraph? That was the first thing I asked myself when I came across it in the Summer of 2023. It was my involvement with Apache Iceberg that brought it to my attention; they wanted to add Iceberg support to PuppyGraph, but just what the heck is it?

This blog is going to be more of a hot take on PuppyGraph to get you thinking about how you might use it in your own projects. I have no affiliation with the company or project other than thinking it was pretty cool. Co-founder Weimo Liu recently (Feb 2024) gave a presentation at the Chill Data Summit that was interesting, and well received, according to my friends that were there.

What is PuppyGraph?

Simply, PuppyGraph is a cloud-native graph data lakehouse providing a graph analytics engine for your data. They address graph scalability through the auto-sharding of data so the compute and storage are separate, much like the lakehouse design. So, they provide a graph data warehouse, data lake, and multi-data models on a single copy of your data. That means you can do some pretty cool graphing on your data in one of the supported formats.

What can it connect to?

PuppyGraph has rapidly added support for various platforms, catalogs, and connection engines. Currently, we see:

  • Apache Iceberg
  • Apache Hudi
  • Delta Lake
  • MySQL
  • PostgreSQL
  • DuckDB
  • BigQuery
  • Redshift
  • LanceDB (coming soon)
  • JDBC Catalog
  • Data Lake CatalogHive MetastoreAWS Glue

Their SaaS interface also gives you direct access to both a Gremlin and Cypher console to perform graph queries, in addition to a graph notebook, which uses Jupyter.

Using PuppyGraph

A Docker container is provided to allow you to get started on a local machine. You’ll need a schema defined in JSON format that will define your data layout to PuppyGraph. Once you ingest that and it is verified, then away you go.

The integrated graph browser is pretty nifty. You can easily zoom in/out to see the clustering and attributes in addition to queries.

Zooming in further, we can see more of the details

Clicking on a node will give us a pop-up of details:

This allows you to explore different vertices and edges easily. These static pictures don’t really represent how fast the performance is or how much fun it is to bounce around your data. I should have utilized some genealogical data for fun.

Because they are using the Gremlin and Cypher query languages, that means any 3rd party UI tool will also be compatible. A real advantage here is that PuppyGraph works on the data where it lives and isn’t making you copy it elsewhere. Without going into the particulars on a specific platform, this gives you a general idea of what features and functions are available.

Summary

Certainly, graph databases and their representation don’t apply generically as a structured database does, but we are seeing more and more how these kinds of data representations are being used to model the real world. I didn’t see that this is an open-source project, and I didn’t find it on GitHub. There is no mention of pricing, so I’m not sure where they are going with all of this. The documentation isn’t amazing, but it seems to be enough to get started and try it out. Overall, this is a fun project to play with. I need to percolate on it more to see where I might use it, but I can envision some interesting use cases combining it with other self-contained projects like DuckDB and LanceDB.

Check out my other What the Heck is… articles at the links below:

Weimo Liu

PuppyGraph: A graph engine for all your data.

1 年

Thank you so much, Shawn!

要查看或添加评论,请登录

Shawn Gordon的更多文章

  • What The Heck is Apache Polaris?

    What The Heck is Apache Polaris?

    Introduction The Data space is almost as volatile as the AI space this year, with many players consolidating. In the…

    4 条评论
  • What the Heck is GPTScript?

    What the Heck is GPTScript?

    Introduction Late in 2023, I was considering writing an article about Acorn Labs' work simplifying Kubernetes…

  • Spotlight on Ask On Data

    Spotlight on Ask On Data

    Introduction AI has been all the rage since late 2022, and it has many more practical applications than we saw from the…

    3 条评论
  • What the Heck is Proton?

    What the Heck is Proton?

    Introduction This series of articles has been a lot of fun for me as I have learned about and explored new technology…

    1 条评论
  • What the Heck is Apache Paimon?

    What the Heck is Apache Paimon?

    Introduction You’ve heard of data warehouses, you’ve probably heard of data lakes and the data lakehouse, but have you…

  • What the Heck is SDF?

    What the Heck is SDF?

    Introduction 2023 has been quite a year for innovation, adoption, and competition. We saw HashiCorp generate…

  • What the Heck is LanceDB?

    What the Heck is LanceDB?

    Introduction I started seeing LanceDB early in 2023, and my first thought was that it might be an attractive fit in the…

    1 条评论
  • What the Heck is Apache SeaTunnel?

    What the Heck is Apache SeaTunnel?

    Introduction I started seeing chatter about Apache SeaTunnel in early 2023 and was low-key keeping an eye on it. The…

    5 条评论
  • Branches & Tags: Comparing Iceberg, Hudi, and Delta Lake Tables

    Branches & Tags: Comparing Iceberg, Hudi, and Delta Lake Tables

    Introduction This blog assumes you know the data lake table formats; otherwise, it might not make much sense. Branching…

    1 条评论
  • What the heck is GlareDB?

    What the heck is GlareDB?

    Introduction It has been a while since my last “What the heck is??” article, and I’ve recently seen some rapid growth…

    1 条评论

社区洞察

其他会员也浏览了