登录查看更多内容

What the Heck is Puppygraph?

Shawn Gordon

Data geek and developer advocate supreme

发布日期: 2024年2月26日

Introduction

What the heck is PuppyGraph? That was the first thing I asked myself when I came across it in the Summer of 2023. It was my involvement with Apache Iceberg that brought it to my attention; they wanted to add Iceberg support to PuppyGraph, but just what the heck is it?

This blog is going to be more of a hot take on PuppyGraph to get you thinking about how you might use it in your own projects. I have no affiliation with the company or project other than thinking it was pretty cool. Co-founder Weimo Liu recently (Feb 2024) gave a presentation at the Chill Data Summit that was interesting, and well received, according to my friends that were there.

What is PuppyGraph?

Simply, PuppyGraph is a cloud-native graph data lakehouse providing a graph analytics engine for your data. They address graph scalability through the auto-sharding of data so the compute and storage are separate, much like the lakehouse design. So, they provide a graph data warehouse, data lake, and multi-data models on a single copy of your data. That means you can do some pretty cool graphing on your data in one of the supported formats.

What can it connect to?

PuppyGraph has rapidly added support for various platforms, catalogs, and connection engines. Currently, we see:

Apache Iceberg
Apache Hudi
Delta Lake
MySQL
PostgreSQL
DuckDB
BigQuery
Redshift
LanceDB (coming soon)
JDBC Catalog
Data Lake CatalogHive MetastoreAWS Glue

Their SaaS interface also gives you direct access to both a Gremlin and Cypher console to perform graph queries, in addition to a graph notebook, which uses Jupyter.

Using PuppyGraph

A Docker container is provided to allow you to get started on a local machine. You’ll need a schema defined in JSON format that will define your data layout to PuppyGraph. Once you ingest that and it is verified, then away you go.

领英推荐

SELECT news FROM Yugabyte - October 24

Yugabyte 4 个月前

The Latest In Distributed SQL - July

TiDB, powered by PingCAP 7 个月前

Using Airbyte with Tabular

Tabular (now part of Databricks) 1 年前

The integrated graph browser is pretty nifty. You can easily zoom in/out to see the clustering and attributes in addition to queries.

Zooming in further, we can see more of the details

Clicking on a node will give us a pop-up of details:

This allows you to explore different vertices and edges easily. These static pictures don’t really represent how fast the performance is or how much fun it is to bounce around your data. I should have utilized some genealogical data for fun.

Because they are using the Gremlin and Cypher query languages, that means any 3rd party UI tool will also be compatible. A real advantage here is that PuppyGraph works on the data where it lives and isn’t making you copy it elsewhere. Without going into the particulars on a specific platform, this gives you a general idea of what features and functions are available.

Summary

Certainly, graph databases and their representation don’t apply generically as a structured database does, but we are seeing more and more how these kinds of data representations are being used to model the real world. I didn’t see that this is an open-source project, and I didn’t find it on GitHub. There is no mention of pricing, so I’m not sure where they are going with all of this. The documentation isn’t amazing, but it seems to be enough to get started and try it out. Overall, this is a fun project to play with. I need to percolate on it more to see where I might use it, but I can envision some interesting use cases combining it with other self-contained projects like DuckDB and LanceDB.

Check out my other What the Heck is… articles at the links below:

Weimo Liu

PuppyGraph: A graph engine for all your data.

1 年

Thank you so much, Shawn!

1 次回应

要查看或添加评论，请登录

Shawn Gordon的更多文章

What The Heck is Apache Polaris?

2024年9月12日

What The Heck is Apache Polaris?

Introduction The Data space is almost as volatile as the AI space this year, with many players consolidating. In the…

4 条评论
What the Heck is GPTScript?

2024年4月18日

What the Heck is GPTScript?

Introduction Late in 2023, I was considering writing an article about Acorn Labs' work simplifying Kubernetes…
Spotlight on Ask On Data

2024年4月1日

Spotlight on Ask On Data

Introduction AI has been all the rage since late 2022, and it has many more practical applications than we saw from the…

3 条评论
What the Heck is Proton?

2023年12月28日

What the Heck is Proton?

Introduction This series of articles has been a lot of fun for me as I have learned about and explored new technology…

1 条评论
What the Heck is Apache Paimon?

2023年12月6日

What the Heck is Apache Paimon?

Introduction You’ve heard of data warehouses, you’ve probably heard of data lakes and the data lakehouse, but have you…
What the Heck is SDF?

2023年10月25日

What the Heck is SDF?

Introduction 2023 has been quite a year for innovation, adoption, and competition. We saw HashiCorp generate…
What the Heck is LanceDB?

2023年10月19日

What the Heck is LanceDB?

Introduction I started seeing LanceDB early in 2023, and my first thought was that it might be an attractive fit in the…

1 条评论
What the Heck is Apache SeaTunnel?

2023年10月16日

What the Heck is Apache SeaTunnel?

Introduction I started seeing chatter about Apache SeaTunnel in early 2023 and was low-key keeping an eye on it. The…

5 条评论
Branches & Tags: Comparing Iceberg, Hudi, and Delta Lake Tables

2023年10月3日

Branches & Tags: Comparing Iceberg, Hudi, and Delta Lake Tables

Introduction This blog assumes you know the data lake table formats; otherwise, it might not make much sense. Branching…

1 条评论
What the heck is GlareDB?

2023年9月20日

What the heck is GlareDB?

Introduction It has been a while since my last “What the heck is??” article, and I’ve recently seen some rapid growth…

1 条评论

See all articles

What the Heck is Puppygraph?

Shawn Gordon

Data geek and developer advocate supreme

Introduction

What is PuppyGraph?

What can it connect to?

Using PuppyGraph

领英推荐

Summary

Shawn Gordon的更多文章

社区洞察

其他会员也浏览了

Import Data into Postgres Table Using Pandas

2025 Guide to Architecting an Iceberg Lakehouse

YOUR SQL PERFORMANCE SUCKS - AND HOW TO FIX IT

Synapse Serverless SQL and file types – the ultimate guide!

Parquet file format – everything you need to know!

Running DBT on Databricks while using dbt_external_tables package to utilize Snowflake Tables

Understanding Apache Iceberg Delete Files

Working with Semi-Structured JSON Data in Databricks

Upcoming Data Talks from Alex Merced (And how to follow)

A retrospective on 2019 SQL Saturday in Los Angeles

Introduction

What is PuppyGraph?

What can it connect to?

Using PuppyGraph

领英推荐

Summary

Shawn Gordon的更多文章

What The Heck is Apache Polaris?

What the Heck is GPTScript?

Spotlight on Ask On Data

What the Heck is Proton?

What the Heck is Apache Paimon?

What the Heck is SDF?

What the Heck is LanceDB?

What the Heck is Apache SeaTunnel?

Branches & Tags: Comparing Iceberg, Hudi, and Delta Lake Tables

What the heck is GlareDB?

社区洞察

其他会员也浏览了

Import Data into Postgres Table Using Pandas

2025 Guide to Architecting an Iceberg Lakehouse

YOUR SQL PERFORMANCE SUCKS - AND HOW TO FIX IT

Synapse Serverless SQL and file types – the ultimate guide!

Parquet file format – everything you need to know!

Running DBT on Databricks while using dbt_external_tables package to utilize Snowflake Tables

Understanding Apache Iceberg Delete Files

Working with Semi-Structured JSON Data in Databricks

Upcoming Data Talks from Alex Merced (And how to follow)

A retrospective on 2019 SQL Saturday in Los Angeles