The law of parallel processing

Fawad A. Qureshi

Global Field CTO @ Snowflake | LinkedIn Learning Instructor | Sustainability ??, Data Strategy, Business Transformation

发布日期: 2023年9月11日

Have you ever seen someone write bad code and try to solve the performance problem by throwing more hardware at it? Higher-level languages like Python and Java make it easier to write code, but they also make it easier to write bad code. This is because these languages do not force programmers to think about the performance of their code in the same way that lower-level languages like Assembly and C do. As a result, programmers who use higher-level languages can sometimes be lazy and write code that is not as efficient as it could be

In a recent article, I discussed the following scalability graph:

Let's discuss how you get parallel efficiency gain. A formula known as Amdahl's Law formulated by former IBM engineer Gene Amdahl discusses that the code's theoretical speedup is directly proportional to the least amount of work that can be parallelized.

Why are some technologies more optimized than others? Let’s find out. Parallel processing is the single biggest driver of computing efficiency. Imagine you land at an airport and have to move your luggage to the taxi rank.

Imagine it takes 1 person 1 minute to move one trolley from the carousel to the taxi rank. If you have 1 trolley, for 10 people, it will take 10 minutes to complete. This is serial processing.?
If you have ten trolleys for ten people, it will only take 1 minute to complete the task. This achieves the maximum parallel efficiency of 10X.?
If five people insist on using the same red trolley, they must do their tasks sequentially. No matter how many free trolleys are available, completing them will always take 5 minutes. This results in a maximum parallel efficiency of 2X even though idle resources are available.?

领英推荐

Qdrant - Open source Vector database

Srinivas Pradeep s 7 个月前

?? Mastering Merge Sort: The Stable and Efficient…

ASHIK L J 3 个月前

Are Data Structures and Algorithms obsolete today?

Taral Pawar 3 年前

Let's examine this analogy in the world of parallel databases:

Here is an example of three different types of databases processing SQL (Above picture credit Daniel Graham ). Imagine we have 100TB of data to analyze.??

On the left is a serial processing database.? It chugs along at the speed of a single server, usually only a? single CPU on that server.? It is the slowest method. It is like having a single trolley for everyone at the airport.??
In the middle, we see some parallel processing where multiple servers work together to do the table scans, joins, sorting, and oops. It doesn’t do sorting in parallel.? That’s a massive bottleneck in performance when terabytes of data are involved. This is like a few people insisting on the red trolley. This means resources are waiting in idle. No matter how many nodes you add to the configuration, you cannot speed it up unless you eliminate the "red trolley."
On the far right is the max efficiency system, which does everything as much as possible in parallel.? This will be the fastest solution.? To go faster, we add more servers to the cluster.

In short, the more parallel efficient your system is, the better would be the system's throughput.?A simple understanding of this principle can help you in writing better code. Always, the maximum parallel efficiency of your code is determined by the least amount of code that can be parallelized.

If you like, please subscribe to the FAQ on Data newsletter and/or follow Fawad Qureshi on LinkedIn.

Daniel Graham

Technical Marketing, Independent Consultant, DBA

1 年

One of those pictures looks familiar. Good explanations too... Thanks Fawad.

Boris Mogilevsky

Data Warehouse Consultant

1 年

Great article, thanks

1 次回应

Marco Ullasci

Data Solutions Architect, Singapore PEP

1 年

Thanks Fawad A. Qureshi for the article. It must be noted for the readers less familiar with the subject that some computations are intrinsically serial in nature or have an intrinsic serial fraction that can't be helped no matter how smart and mature is the database optimizer. I'd like to also point out that sometimes there are approximated algorithms that can have a better efficiency and yet be good enough for certain user cases. I recall Vertica, for example, offered both an exact and an approximate "count distinct" with very different performance profiles.

3 次回应

查看更多评论

要查看或添加评论，请登录

Fawad A. Qureshi的更多文章

The Future is Open

2024年10月7日

The Future is Open

I write weekly on different topics related to Data and AI. Feel free to subscribe to FAQ on Data newsletter and/or…

2 条评论
Building Data Supply Chains

2024年9月30日

Building Data Supply Chains

I write weekly on different topics related to Data and AI. Feel free to subscribe to FAQ on Data newsletter and/or…
A confused mind never buys

2024年9月23日

A confused mind never buys

I often get emails from friends and colleagues asking for help evaluating different products in the Artificial…

9 条评论
Synthetic Data: Accelerating AI Development While Safeguarding Privacy

2024年9月16日

Synthetic Data: Accelerating AI Development While Safeguarding Privacy

There is no AI strategy without a data strategy, a phrase used in most AI conferences today. However, there is…

4 条评论
The Dangers of AI Washing

2024年9月9日

The Dangers of AI Washing

Fifteen years ago, when Big Data was all the rage, many companies that could barely manage small datasets branded their…

7 条评论
The Last Mile of Analytics

2024年9月2日

The Last Mile of Analytics

The "last mile" is a term borrowed from the world of telecommunications. In telecom, the last mile refers to the final…
The Feynman Technique

2024年8月26日

The Feynman Technique

Throughout my career, one principle has consistently served me well: whenever I try to learn something new, I approach…

4 条评论
AI?for?Sustainability and the Sustainability?of?AI

2024年8月19日

AI?for?Sustainability and the Sustainability?of?AI

In case you missed it, artificial intelligence is becoming increasingly ubiquitous in our world today. It is reshaping…

1 条评论
Quantity over Quality: The Power of Persistence

2024年8月12日

Quantity over Quality: The Power of Persistence

We have always heard that quality is better than quantity; one home run is better than two doubles. However, while…
Presenting with a Robot

2024年8月5日

Presenting with a Robot

Have you ever presented alongside a robot? I have. You haven’t? Haven’t you heard of PowerPoint’s Rehearse with Coach…

1 条评论

See all articles

The law of parallel processing

Fawad A. Qureshi

Global Field CTO @ Snowflake | LinkedIn Learning Instructor | Sustainability ??, Data Strategy, Business Transformation

领英推荐

Fawad A. Qureshi的更多文章

社区洞察

其他会员也浏览了

Milk - MIT's new progamming language to speed up Big Data jobs by 4x

?? Mastering Merge Sort: The Stable and Efficient Sorting Algorithm!

Are Data Structures and Algorithms obsolete today?

Distributed Recursive Kalman Filter on Large Datasets

Mastering Data Structures and Algorithms: A Comprehensive Guide

Data Science at Scale on bp's On-Prem Supercomputer (Part 3)

Bloom Filter

Optimizing Spark Configuration with Genetic Algorithm - Encoding

Roadmap to Mastering Data Structures and Algorithms (DSA)

Netflix Open Sources Polynote to Make Data Science Notebooks Better

领英推荐

Fawad A. Qureshi的更多文章

The Future is Open

Building Data Supply Chains

A confused mind never buys

Synthetic Data: Accelerating AI Development While Safeguarding Privacy

The Dangers of AI Washing

The Last Mile of Analytics

The Feynman Technique

AI?for?Sustainability and the Sustainability?of?AI

Quantity over Quality: The Power of Persistence

Presenting with a Robot

社区洞察

其他会员也浏览了

Milk - MIT's new progamming language to speed up Big Data jobs by 4x

?? Mastering Merge Sort: The Stable and Efficient Sorting Algorithm!

Are Data Structures and Algorithms obsolete today?

Distributed Recursive Kalman Filter on Large Datasets

Mastering Data Structures and Algorithms: A Comprehensive Guide

Data Science at Scale on bp's On-Prem Supercomputer (Part 3)

Bloom Filter

Optimizing Spark Configuration with Genetic Algorithm - Encoding

Roadmap to Mastering Data Structures and Algorithms (DSA)

Netflix Open Sources Polynote to Make Data Science Notebooks Better