登录查看更多内容

DuckDB A Server-less Analytics Option

Remesh Govind N M

VP Data Eng. | AWS Certified Architect | Software Delivery | Helping Startups / IT Driven companies with Data Integration, Big data, Mobile applications, iOS , Android, Cloud, Web

发布日期: 2023年5月24日

After Exploring some of the options earlier such as Apache spark and Polars

DuckDB (#duckdb) is a lightweight, open-source analytical SQL database that provides fast query response times. It is designed to work with large sets of data hosted on RAM via single-node machines. This feature enables the database to provide rapid calculation results without any compromise on accuracy.

DuckDB has been formulated for scenarios where fast performance on relatively low to medium scale datasets matters. Scaling up is an option which in the current day and age is not so much a challenge as it used to be in the past. It is useful for applications like machine learning experimentation, web analytics, and data science research. While #pandas has improved with pyarrow as the engine, its memory requirements are quite demanding even after several imporvements.

Use Cases:

1. Data Science Research - DuckDB's agility monitors scientific research needs, particularly making it efficient at informal inquiry and speeding queries with quicker response times. This is thanks to Hannes Mühleisen 's teams effort in turning it around so fast. Read his article link post on ART here (off site).

2. Machine Learning Experimentation - DuckDB offers fast querying for smaller size datasets used for process testing.

3. Web Analytics – Its efficient storage mechanisms have made it a valuable tool in site traffic analytics providing stakeholders real-time metrics.

4. As an alternate to SQL Lite #sqllite – SQLLite has several limitations which can easily be overcome by #duckdb.

领英推荐

Vector Databases - A List

Andy Palmer 6 个月前

Distributed Bloom Filter

Patrick Nicolas 3 个月前

Harnessing the Power of Iceberg

Vikram Joshi 9 个月前

Some interesting options:

1. Scalability – It believes in scale up rather than scale out. Yes i'm a #apchespark guy. See above about how costing can be re-thought about. See the next para.

2. Less Support – Honestly I think this is a thing of the past. Yes Im debunking that myth here. You can even have Serverless here. Mother Duck does just that. As they like to say its "DUCKING AWESOME" . You can see that on the home page. Catchy hanh!

3. dbt #dbt support. Suweeet! Dont forget dbt supports Jinja which in turn works with Django, Flask and many more.

4. #duckdb even seems faster than #polars.

By the way Polars just released 0.17.15. If you are keen to keen to learn a bit more click here. To see @ritchievink' s post see here.

On a closing note, one notable feature of DuckDB is its ability to handle complex analytical tasks with ease. The database uses columnar storage and vectorized execution to speed up query processing times without sacrificing accuracy or scalability. This makes it well-suited for data engineeering, data science tasks like machine learning and statistical analysis, as well as other types of big data analytics. Yes when serverless why worry about scale. Overall, DuckDB offers a compelling option for those seeking a lightweight analytical database with advanced functionality.

BTW, they are hiring: Email [email protected]

DuckDB A Server-less Analytics Option

Remesh Govind N M

VP Data Eng. | AWS Certified Architect | Software Delivery | Helping Startups / IT Driven companies with Data Integration, Big data, Mobile applications, iOS , Android, Cloud, Web

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

FLaNK Stack Weekly for 5 September 2023

NiFi and Retrieval Augmented Generation

Week of May 13th

Lakes, Lakehouses, Warehouse and.....MDM?

How to optimize Pyspark Codes for better efficiency.

End to End Pyspark Example

Technical Analysis of the latest UK House Price Index, Deploying Modern tools

January 2023 - Iceberg Community News

Snowflake Materialized View Query Auto-Rewrite

Algorithms for Work: Data Type - Small Things Matter.

领英推荐

Scala Vs Go

2023年6月17日

DuckDB Access Over HTTPS

2023年6月9日

Querying Parquet, CSV Using DuckDB and Python on Amazon S3

2023年6月5日

Accessing Polars from RUST

2023年5月18日

Bard vs ChatGPT

2023年5月15日

Polars the nextgen dataframe library.

2023年5月11日

5 Reasons to Choose Rust as Your Next Programming Language

2023年5月9日

Polars vs Apache Spark from a Developer's Perspective

2023年5月3日

Apache Spark 2 Vs Apache Spark 3

2023年5月1日

Upgrade to Catalina MacOS or Not?

2019年10月12日

社区洞察

其他会员也浏览了

FLaNK Stack Weekly for 5 September 2023

NiFi and Retrieval Augmented Generation

Week of May 13th

Lakes, Lakehouses, Warehouse and.....MDM?

How to optimize Pyspark Codes for better efficiency.

End to End Pyspark Example

Technical Analysis of the latest UK House Price Index, Deploying Modern tools

January 2023 - Iceberg Community News

Snowflake Materialized View Query Auto-Rewrite

Algorithms for Work: Data Type - Small Things Matter.