登录查看更多内容

True Concurrency: Parallel Picture Processing in C

Tiberiu Andrei Georgescu

PhD Student - Safe and Trusted AI at Imperial College London

发布日期: 2020年8月18日

Some of my peers find me a little bit of a masochist for this, but I love writing concurrent code. It is so satisfying to engineer a program or a system in such a way that adding more worker threads will make it run faster, without sacrificing the correctness of the program. There are still so many issues that could benefit from running on more cores, or from a redesign to make the algorithm more efficient when parallelism comes into play. And I am not even mentioning all the algorithms that do not have a lock-free version yet.

This year, I was able to test some of my understanding of concurrent programming by working on a project by myself called True Concurrency. What we had to do was write a program in C that would process multiple images in parallel. This program would take a continuous input stream of specific commands (e.g. load images/ducky.jpg ducky; rotate 180 ducky; invert ducky; save ducky output.jpg; exit) and run them. If multiple images are mentioned in the input commands, they should be processed in parallel, in the order in which they arrived.

On top of that, we would have to make a study on what would be the fastest parallel algorithm that can be used to process an image. We were asked to come up with 5 or more different algorithms for that and test them.

To finish both these tasks, we were given 5 days, from Monday to Friday. Once again, I want to mention that we had to write this in C from almost scratch (we were given the picture processing functions). Even now, I feel that it was a huge task to build, and 5 days was just crazy. Writing it became the first and last task I had to do wherever I went, either at uni or at home.

Luckily, I finished it. And I did a pretty good job at it (87%, not bad), but I was lucky. Lucky it was the end of term and had nothing else to do, and lucky I wrote only C code for 10 weeks prior to the project (as bad as it was, it will never beat Pintos).

So, without further ado, let me tell you what did I do on the project:

True Concurrency

Part 1: Loopdy loop

And they called this the easy part. Here, we had to make the command line interpreter.

Our program would have to behave like a command on the terminal: once typed, it would continuously (in a loop) ask for commands to run. For this task, we would be able to process the files sequentially.

I have started by creating a method of loading and storing the pictures within the memory of the program. For that, I have made use of a piece of code I wrote on Monday right before starting the work on True Concurrency: a thread-safe linked list.

This list uses a finer-grained approach for inserting/finding/removing elements. It uses hand-over-hand locking in order to pass through different elements. For the sake of keeping this post as short as possible, I will not go into this algorithm. However, I found this link that offers some insight into different ways of thread-safe lists if anyone is interested.

So, I just pasted the linked list. Finished? Nope.

After testing loading, unloading and printing the names of the images stored, I started working on the execution loop. The program would wait for the user to provide some input. Then, the input would be parsed to check whether it fits any known command (you got to love strtokr). As soon as I could, I turned the strings into enums ("invert" -> INVERT) and the rest into a neat little struct I like to use in C, containing an array of arguments and the amount of arguments it has (C makes me miss .length methods).

After getting all the good stuff from the parsed line, I would use a neat little trick I've learnt from the base code: using arrays of functions.

Let me explain. Imagine n functions to process a picture: "invert", "grayscale", "rotate" etc. Then, based on a string, you have to choose a function to do. What do you do?

Compare the first word in the string with the known keywords. When they are the same, you know what function to do. Ok, but ugly
Turn the string into an enum, then use a switch to find the right function based on the enum. Less ugly, but even more inefficient that 1.

Consider that I did not know anything about Finite State Machines at that point. If I did, I would have used one of those to convert the string into an enum. Then 2 would have been more efficient than 1, but still not perfect.

Well, C offered me a third option:

3. Turn the string into an enum, then get the function from an array of functions.

The enum is saved as an int, so it can be used as an array index. With correct ordering, this is the fastest option (especially when using an FSM) and, still, a very clean one (to be fair, it is dangling on the ordering of the functions in the array and the elements of the enum, but it is what it is).

So that was it. After fixing some funny memory leaks from the base code, I had myself a fully functional program. I could process any image I wanted, however I wanted (as long as there was a function for it).

Only issue: sequential processing. Even for multiple pictures.

Part 2: Concurrency

Now came a tricky bit: how to make it concurrent? Retrospectively, I think I could have done it better. I'm still proud of what I've done, but there must have been better, or at least more intuitive ways. Well, here we go.

The algorithm I envisioned worked like this. For each line written, the program would dispatch a worker with it. A worker would be placed on the picture-storing data structure before dispatching a new one, and would work its way up the linked list until it found the picture it was looking for. It would then process that picture and save it on the same node in the list, unlock the node to allow for more workers to work on it, then kill itself and the thread it occupied (do not try this at your work office).

There are some pros and some cons for this strategy.

Pros:

Ensured Sequential Consistency (workers that need to work on the same picture have to wait in line)
Pictures can be processed in parallel (multiple workers can work at the same time on different nodes in the linked list)

Cons:

Instantiation of multiple threads (one thread per worker, one worker per command, even with only one command per image, if there are 100 images, it may lead to lots of resource usage. Program might slow down.)
Bottle neck creation (the worker processing the first node in the list will block the rest of the pictures from getting processed).

I guess there are both more pros and more cons, but these would be the main ones. The first con can be fixed by using a thread/worker pool. This means that the workers will not be created and deleted, but just reused once they finish their jobs. Less resource intensive on extreme scenarios.

Unfortunately, I did not do the thread pool by the end of the 5 days. However, to fix the second issue, I did something else: a concurrent hash map! (kinda)

It was surprisingly simple to turn the linked list into a hash map. What I did was create an array of concurrent linked lists. In order to insert an image on the hash map, I would hash the name of the picture and insert it in the linked list at the index of the hash.

Refactoring the rest of the code to accommodate the new data structure was a bit intense, but this part, alongside the first were done in 2 days. I used the third day to build the hash map, refactor a lot (I'm an advocate of frequent and aggressive refactoring) and write a report on the thread safety of the project. I could share my needlessly large report on it here, if anyone is interested.

Now, all that remained was pretending I was a researcher for two days.

Part 3: Optimising Blurring

The algorithm we had to experiment on was blurring. A simple convolution, getting the average of a square of 9 pixels and rewriting the pixel in the middle to its value.

Simple, but quite resource intensive.

For this experiment, I made a program similar to the one for parts 1, keeping only the picture processing algorithm concurrent. The command line argument for the test would look like this:

./blur_opt_exprmt ?p <picture> ?i <iterations>

Here, I can provide the program with a picture and an amount of iterations. The program will then get 5 copies of that program and will blur each with one of the algorithms. For each blur, it will get the time from the start until the end of the blurring method.

This process will happen for the given amount of iterations. After the blurring finishes, the program will assert the correctness of the blurs by comparing them with a sequential blurring (which is assumed it is correct). Then, the delta times for each blurring method will be averaged, and the program will output the results like this:

I am happy to say that I have received full marks for this part. For more information, check out my paper on the experiment:

And yes, I did the experiment on a picture of Cthulhu :)

Final thoughts

After 5 days of grinding this project, I felt satisfied. It was so much, and it was so little time, but it was such an amazing learning experience.

I managed to practice so many skills at once: project design & management, clean coding, concurrent programming, debugging (gdb & valgrind) and even algorithm experimentation.

After the first term, having finished both Pintos and True Concurrency, I felt like I could do anything in C. Of course it's not true (yet), but that's what the point of experience, I guess.

If you want to check out my code for the project, I just moved it to a public repo on GitHub (do not worry, I did not finish it in one commit). I intend to make some more changes, to add all the optimisations I came up with while writing this article. Everything will appear in the ReadMe file in due time.

In the meanwhile, try out a project in the most low level language you know. It's gonna take some time, you can count on it. But you will feel amazing once it's finished. And you'll brag about it wherever you go.

And make your code readable. Your future self will thank you bunches.

Now, time to decypher.

查看更多评论

要查看或添加评论，请登录

Tiberiu Andrei Georgescu的更多文章

How to Win a Hackathon

2023年1月28日

How to Win a Hackathon

Did you ever participate in a hackathon? If you're an engineer, I cannot recommend the experience enough. It's a fun…

3 条评论
WACC Compiler Project

2020年8月5日

WACC Compiler Project

& Semi-Book Review for "Engineering a Compiler" This year was tough. Next to courses that took a long time to prepare…

2 条评论
First Job: Teaching Game Development

2019年10月15日

First Job: Teaching Game Development

The year before coming to Imperial was one of my more interesting ones. I had my baccalaureate to take, and a lot to…

1 条评论
Chainhack24: My first hackathon, in Blockchain!

2019年10月14日

Chainhack24: My first hackathon, in Blockchain!

I always said that I would talk more about my first hackathon, in London. The event was called Chainhack24, and took…
Project: ARM11 & Raspberry Pi Smart Mirror

2019年10月13日

Project: ARM11 & Raspberry Pi Smart Mirror

During my first year of college, I have had multiple occasions to work in a team on a big scale project. One of the…

1 条评论
Clean Code - Review

2019年6月25日

Clean Code - Review

I have recently finished reading this masterpiece of a book. I started reading it while working on a group programming…
#100DaysOfCode 22/100: Oh boy, here comes Computer Vision

2019年1月30日

#100DaysOfCode 22/100: Oh boy, here comes Computer Vision

Ok, this is the first time I'm using the article thing on LinkedIn, and it's for a DayOfCode! You may think that it's…

See all articles

True Concurrency: Parallel Picture Processing in C

Tiberiu Andrei Georgescu

PhD Student - Safe and Trusted AI at Imperial College London

True Concurrency

Part 1: Loopdy loop

Part 2: Concurrency

Part 3: Optimising Blurring

Final thoughts

Tiberiu Andrei Georgescu的更多文章

社区洞察

其他会员也浏览了

Back to the Roots: Let's Talk About Leyden, Loom, and Valhalla - JVM Weekly vol. 96

React has some new updates: My Concise Overview of React 19's Updates—What Every Developer Ought to Know

Writing Async Code in?Rust

Understanding the Go Concurrency Model

All About Compiler

[VV75] blocking vs non-blocking, Spring AI, IntelliJ, dev future, exceptions, clean code, OpenTelemetry, GenAI, Jakarta EE 10

Type Casting in Embedded C: Enhancing Efficiency and Reliability

Reflection in C++26: Metafunctions

Evolution of Programming Languages and processor architectures : DSL to CYOL, Processors to DSA ( Domain Specific architectures )

Four Noteworthy Features of Golang

True Concurrency

Part 1: Loopdy loop

Part 2: Concurrency

Part 3: Optimising Blurring

Final thoughts

Tiberiu Andrei Georgescu的更多文章

How to Win a Hackathon

WACC Compiler Project

First Job: Teaching Game Development

Chainhack24: My first hackathon, in Blockchain!

Project: ARM11 & Raspberry Pi Smart Mirror

Clean Code - Review

#100DaysOfCode 22/100: Oh boy, here comes Computer Vision

社区洞察

其他会员也浏览了

Back to the Roots: Let's Talk About Leyden, Loom, and Valhalla - JVM Weekly vol. 96

React has some new updates: My Concise Overview of React 19's Updates—What Every Developer Ought to Know

Writing Async Code in?Rust

Understanding the Go Concurrency Model

All About Compiler

[VV75] blocking vs non-blocking, Spring AI, IntelliJ, dev future, exceptions, clean code, OpenTelemetry, GenAI, Jakarta EE 10

Type Casting in Embedded C: Enhancing Efficiency and Reliability

Reflection in C++26: Metafunctions

Evolution of Programming Languages and processor architectures : DSL to CYOL, Processors to DSA ( Domain Specific architectures )

Four Noteworthy Features of Golang