The law of parallel processing
Fawad A. Qureshi
Global Field CTO @ Snowflake | LinkedIn Learning Instructor | Sustainability ??, Data Strategy, Business Transformation
Have you ever seen someone write bad code and try to solve the performance problem by throwing more hardware at it? Higher-level languages like Python and Java make it easier to write code, but they also make it easier to write bad code. This is because these languages do not force programmers to think about the performance of their code in the same way that lower-level languages like Assembly and C do. As a result, programmers who use higher-level languages can sometimes be lazy and write code that is not as efficient as it could be
In a recent article, I discussed the following scalability graph:
Let's discuss how you get parallel efficiency gain. A formula known as Amdahl's Law formulated by former IBM engineer Gene Amdahl discusses that the code's theoretical speedup is directly proportional to the least amount of work that can be parallelized.
Why are some technologies more optimized than others? Let’s find out. Parallel processing is the single biggest driver of computing efficiency. Imagine you land at an airport and have to move your luggage to the taxi rank.
领英推荐
Let's examine this analogy in the world of parallel databases:
Here is an example of three different types of databases processing SQL (Above picture credit Daniel Graham ). Imagine we have 100TB of data to analyze.??
In short, the more parallel efficient your system is, the better would be the system's throughput.?A simple understanding of this principle can help you in writing better code. Always, the maximum parallel efficiency of your code is determined by the least amount of code that can be parallelized.
If you like, please subscribe to the FAQ on Data newsletter and/or follow Fawad Qureshi on LinkedIn.
Technical Marketing, Independent Consultant, DBA
1 年One of those pictures looks familiar. Good explanations too... Thanks Fawad.
Data Warehouse Consultant
1 年Great article, thanks
Data Solutions Architect, Singapore PEP
1 年Thanks Fawad A. Qureshi for the article. It must be noted for the readers less familiar with the subject that some computations are intrinsically serial in nature or have an intrinsic serial fraction that can't be helped no matter how smart and mature is the database optimizer. I'd like to also point out that sometimes there are approximated algorithms that can have a better efficiency and yet be good enough for certain user cases. I recall Vertica, for example, offered both an exact and an approximate "count distinct" with very different performance profiles.