登录查看更多内容

Concurrent Programming in Python. Main Concepts

Alexander Antonov

Senior Software Engineer | 14+ years in IT | Python | PHP

发布日期: 2024年5月27日

Concurrent programming is a fundamental aspect of software development that allows multiple tasks to be executed at the same time, but not necessarily simultaneously. It is a method to structure applications as composition of independent units that can be executed concurrently. This approach can significantly improve the efficiency and performance of applications, especially when we talk about multi-processor systems.

Core concepts are pretty similar to other languages, but Python has some specific features like GIL, AsyncIO, Greenlets and its own history of asynchronous programming development, so to keep things simple we will discuss them in relation to Python.

I'll put a summary table here to help structure all next information in the head:

Before we dive deep let's go over some definitions, which will help us better understand the essence of the subject matter.

Concurrency is not Parallelism

That is the title of a well-regarded talk by Rob Pike that clarifies the often-confused concepts of concurrency and parallelism in computing and how concurrency is implemented in Go language. This is essential and we must learn to distinguish between these principles.?

Concurrency - managing multiple tasks at the same time. It involves structuring a program to handle multiple tasks that can make progress independently, regardless of whether they are running simultaneously. In other words, concurrency is a way to structure programs to manage multiple tasks efficiently.?

Parallelism - executing multiple tasks simultaneously. It specifically refers to leveraging multiple processors or cores to run tasks at the same time. In other words, parallelism is a way to make programs run faster by executing multiple tasks simultaneously.

To better understand the idea, you can watch Rob Pike's presentation. It is available online and is quite interesting.

Preemptive and Cooperative Multitasking

Preemptive and cooperative multitasking are two different approaches to managing how multiple tasks (or threads) share CPU resources in a computing environment.?

In Preemptive Multitasking, the operating system (OS) is responsible for managing the execution of tasks. The OS can interrupt and suspend (or "preempt") a running task to give another task a turn to execute, ensuring that all tasks get a fair share of CPU time.

Key features:

OS-Controlled. The OS decides when a task is interrupted and when another task is given CPU time.
Time Slicing. The OS uses time slices to ensure tasks switch frequently enough to give the illusion of simultaneous execution.
Responsiveness. Preemptive multitasking generally provides better system responsiveness, as tasks that are ready to run can be scheduled promptly.
Fairness. Since the OS controls task switching, it can ensure fair distribution of CPU time among tasks, preventing any single task from monopolizing the CPU.

In Cooperative Multitasking, tasks voluntarily yield control of the CPU to allow other tasks to run. A task continues to run until it explicitly gives up control, typically by calling a yield function or reaching a point where it can no longer proceed until some condition is met (e.g., waiting for I/O).

Key features:

Task-Controlled. The running tasks themselves decide when to yield the CPU.
Potential for Unresponsiveness. If a task does not yield control appropriately, it can monopolize the CPU, leading to unresponsive systems.
Less Overhead. Since context switches happen less frequently and at known points, there is generally less overhead compared to preemptive multitasking.
Dependence on Well-Behaved Tasks. The system relies on tasks being well-behaved and regularly yielding control to ensure all tasks get a chance to run.

What is the Event Loop?

An event loop is a programming construct that waits for and dispatches events or messages in a program. It is a fundamental component of asynchronous programming, enabling non-blocking I/O operations and efficient task management.

Key features:

Task Scheduling. The event loop schedules and manages multiple tasks, such as I/O operations, coroutines, and callbacks. It ensures that these tasks are executed in an orderly manner without blocking the main program flow.
Non-blocking I/O. It allows the program to perform I/O operations (like reading from a file or making a network request) without waiting for each operation to complete and blocking the main thread.
Concurrency. The event loop enables concurrency, meaning multiple tasks can be in progress at the same time, sharing the same thread of execution. This is achieved through cooperative multitasking, where tasks voluntarily yield control back to the event loop, allowing other tasks to run.
Coroutine Management. In many modern programming languages, the event loop manages coroutines, which are functions that can be paused and resumed. Coroutines are used to write asynchronous code in a sequential style, making it easier to read and maintain.

What are the Coroutines?

Coroutines are a special type of function in programming that can be paused and resumed, allowing for asynchronous execution of code. They enable concurrent programming by allowing multiple tasks to be interleaved in a single thread without blocking it. This is particularly useful for I/O-bound and high-level structured network code.?

So, in other words, regular functions run to completion once called, whereas coroutines can yield control back to the caller, allowing other code to run, while, for example, it waits for something.

What is the GIL?

The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple native threads from executing Python code simultaneously. This means that in a multi-threaded Python program, even if you have multiple CPU cores, only one thread can execute Python code at a time.?

The GIL is a specific feature of the CPython (the reference implementation of Python) and is not common for other programming languages. It makes memory management and garbage collection easier to implement correctly and brings some performance benefits for single-threaded programs. For example, Jython or IronPython implementations of Python do not have a GIL and according to PEP 703 GIL is going to be optional in CPython for Python 3.13. So, things could change significantly in the near future.

Multiprocessing vs Multithreading vs Async

Now, let’s move on and tackle the main topic. Python offers several paradigms for handling concurrent execution: multiprocessing, multithreading and async. Each approach has its strengths and use cases, and understanding the differences between them can help you choose the right tool for your specific needs.?

1. Multiprocessing

Multiprocessing involves running multiple processes simultaneously, with each process having its own Python interpreter and memory space. This is particularly useful for CPU-bound tasks, where the main bottleneck is the CPU rather than I/O operations.

Key features:

Isolation. Each process runs independently with its own memory space, which eliminates the Global Interpreter Lock (GIL) issues, but has higher memory consumption.
Parallelism. True parallelism is achieved by utilizing multiple CPU cores.
Robustness. Crashes in one process do not affect other processes.

Example using Python multiprocessing module:

from multiprocessing import Process, current_process


def worker(num):
   print(f"Worker: {num}, PID: {current_process().pid}")


if __name__ == "__main__":
   processes = []
   for i in range(5):
       p = Process(target=worker, args=(i,))
       p.start()
       processes.append(p)

   for p in processes:
       p.join()

As a more simple solution you can use ProcessPoolExecutor from the concurrent.futures. This module provides a high-level interface for asynchronously executing functions using threads or processes.

Benefits:

Simplified API. Easier to use compared to directly managing threads or processes.
Resource Management. Automatically manages the lifecycle of threads/processes, reducing boilerplate code.
Scalability. Easy to switch between threads and processes by changing the executor.

Example using concurrent.futures and ProcessPoolExecutor:

from concurrent.futures import ProcessPoolExecutor
from multiprocessing import current_process
from threading import get_ident
import time


def worker(num):
   time.sleep(3)
   return f"Worker: {num}, PID: {current_process().pid}, Thread ID: {get_ident()}"


with ProcessPoolExecutor(max_workers=5) as executor:
   futures = [executor.submit(worker, i) for i in range(5)]
   results = [future.result() for future in futures]

print(results)

2. Multithreading

Multithreading involves running multiple threads within the same process, sharing the same memory space. It is suitable for I/O-bound tasks, where the main bottleneck is waiting for I/O operations to complete.?

领英推荐

Python in Practice: Real-World Programming Deep Dive…

Free Online Courses With Printable Certificates 1 年前

10 Things to Avoid When Developing Python Applications

TAFF Inc 1 个月前

Python OOPs (Object Oriented Programming) - NareshIT

Naresh i Technologies 2 年前

Main thing that we need to remember here for now is that threads cannot be executed in parallel because of the GIL (we are talking about CPython, of course), but threads can still be very useful for certain types of tasks that do not need to modify the interpreter's state. These are usually I/O-bound operations such as network requests and access to the file system. Such operations can release the GIL while waiting for a result, allowing other threads to start executing.?

Key features:

Shared Memory. Threads share the same memory space, which makes communication between threads faster.
GIL Constraint. Due to Python's Global Interpreter Lock (GIL), only one thread can execute Python code at a time, limiting true parallelism.
Lightweight. Threads are lighter and consume fewer resources compared to processes.

Example using Python threading module:

from threading import Thread, get_ident


def worker(num):
   print(f"Worker: {num}, Thread ID: {get_ident()}")


threads = []
for i in range(5):
   t = Thread(target=worker, args=(i,))
   t.start()
   threads.append(t)

for t in threads:
   t.join()

And again you can use the concurrent.futures module, but now with ThreadPoolExecutor. Just swap ProcessPoolExecutor with ThreadPoolExecutor in the previous example and you're good to go.

Example using concurrent.futures and ThreadPoolExecutor:

from concurrent.futures import ThreadPoolExecutor
from multiprocessing import current_process
from threading import get_ident
import time


def worker(num):
   time.sleep(3)
   return f"Worker: {num}, PID: {current_process().pid}, Thread ID: {get_ident()}"


with ThreadPoolExecutor(max_workers=5) as executor:
   futures = [executor.submit(worker, i) for i in range(5)]
   results = [future.result() for future in futures]

print(results)

3. Async

Asynchronous programming is a paradigm that allows a program to perform tasks concurrently, without waiting for each task to complete before starting the next one. This is particularly useful for I/O-bound operations, such as network requests, file I/O, or database queries, where waiting for a response can significantly delay the execution of the program.?

This method is the most lightweight and can be used for handling thousands of independent tasks (requests) concurrently. Switching of asynchronous tasks managed by code (usually with help of Event Loop) and not by the OS, which combined with lower memory consumption can be a more efficient and controllable solution for some types of tasks compared to the multiprocessing and multithreading.

Actually here we should talk about two different approaches to achieving async programming, each with its own mechanisms, advantages, and trade-offs. Let's start with the more popular one.

AsyncIO

AsyncIO is a library in Python that provides support for asynchronous programming. It allows you to write code using the async/await syntax, which can perform tasks concurrently without using multiple threads or processes.

AsyncIO uses an event loop to manage tasks asynchronously, allowing for non-blocking operations and efficient I/O handling. Tasks are defined as coroutines that can yield control back to the Event Loop, enabling other tasks to run concurrently.

Key Features:

Event Loop. Central to AsyncIO is the event loop, which manages and schedules asynchronous tasks.
Coroutines. Functions defined with async def and designed to be paused and resumed. They use await to yield control back to the event loop, allowing other tasks to run.
Non-blocking. Uses non-blocking I/O to improve efficiency.
Single-threaded. Runs on a single thread, avoiding issues related to GIL.
Cooperative multitasking. A running task must explicitly yield control.
Performance. Efficient for I/O-bound and high-level structured network code. Avoids the overhead of thread context switching and the Global Interpreter Lock (GIL) contention.
Explicit syntax. Uses async and await keywords, making the asynchronous flow clear in the code.
Integration. You have to use specific libraries that support AsyncIO, such as aiohttp for HTTP, aiomysql for MySQL, etc.

Example:

import asyncio
import time


async def fetch_data_async():
   await asyncio.sleep(2)
   return "Data fetched"


async def main():
   start = time.time()
   result = await fetch_data_async()
   end = time.time()

   print(result)
   print(f"Time taken: {end - start} seconds")


asyncio.run(main())

In this example, fetch_data_async() is an asynchronous function that simulates a delay using await asyncio.sleep(2). The main function awaits the result, demonstrating how asynchronous code can run concurrently without blocking the entire program.?

To execute asynchronous functions, you need an Event Loop, which manages the execution of coroutines. The async.run() function is commonly used to run the main coroutine.

One of the primary benefits of asynchronous programming is the ability to run multiple tasks concurrently. The AsyncIO module provides several ways to achieve this, including asyncio.gather() and asyncio.create_task().

Example with asyncio.gather():

import asyncio
import time


async def fetch_data_1():
   await asyncio.sleep(1)
   return "Data 1 fetched"


async def fetch_data_2():
   await asyncio.sleep(2)
   return "Data 2 fetched"


async def main():
   start = time.time()
   results = await asyncio.gather(fetch_data_1(), fetch_data_2())
   end = time.time()

   print(results)
   print(f"Time taken: {end - start} seconds")


asyncio.run(main())

Here, fetch_data_1() and fetch_data_2() run concurrently, reducing the total wait time to 2 seconds instead of 3 seconds.

Example with asyncio.create_task:

import asyncio
import time


async def fetch_data_1():
   await asyncio.sleep(1)
   return "Data 1 fetched"


async def fetch_data_2():
   await asyncio.sleep(2)
   return "Data 2 fetched"


async def main():
   start = time.time()
   task1 = asyncio.create_task(fetch_data_1())
   task2 = asyncio.create_task(fetch_data_2())

   await task1
   await task2
   end = time.time()

   print(task1.result(), task2.result())
   print(f"Time taken: {end - start} seconds")


asyncio.run(main())

Greenlets

Greenlets are lightweight in-process coroutines provided by the greenlet library or through frameworks like gevent. They allow for concurrent programming by enabling cooperative multitasking with explicit yielding.

Key Features:

Micro-Threads. Greenlets are small independent pseudo-thread managed within a single native thread created and managed explicitly by the programmer.
Switching. Switches between greenlets are explicit but do not require the async and await syntax.
Concurrency Model. Also uses cooperative multitasking, but often relies on monkey patching standard libraries to make them non-blocking.
Performance. Efficient for I/O-bound tasks and concurrency but requires careful management to avoid blocking the main native thread.
No special syntax. Greenlets switch context explicitly, often by gevent.sleep() or greenlet.switch().
Integration. Requires monkey patching to work with existing synchronous code, which can lead to subtle bugs, when monkey-patched libraries don’t work as expected.

Example:

from greenlet import greenlet


def task1():
   print("Task 1 start")
   gr2.switch()
   print("Task 1 end")


def task2():
   print("Task 2 start")
   gr1.switch()
   print("Task 2 end")


gr1 = greenlet(task1)
gr2 = greenlet(task2)

gr1.switch()

Example with gevent, which abstracts greenlets with an event loop:

import gevent
from gevent import monkey; monkey.patch_all()


def task1():
   print("Task 1 start")
   gevent.sleep(1)
   print("Task 1 end")


def task2():
   print("Task 2 start")
   gevent.sleep(1)
   print("Task 2 end")


g1 = gevent.spawn(task1)
g2 = gevent.spawn(task2)

gevent.joinall([g1, g2])

As a conclusion for the Async section, we can conclude that both AsyncIO and Greenlets provide mechanisms for achieving concurrency in Python, but they are suited for different scenarios and have different trade-offs.?

AsyncIO is more explicit and integrated with the Python language, making it suitable for modern async applications.?

Greenlets offer a more implicit approach that can be easier to integrate with existing synchronous code but comes with potential complications related to monkey patching and context management.

Conclusion

So, as you can see there are several ways to achieve async programming in Python. Choosing between Multiprocessing, Multithreading, and AsyncIO depends on the nature of your task and your performance requirements. Multiprocessing is ideal for CPU-bound tasks requiring true parallelism, while multithreading is suitable for I/O-bound tasks where threads can efficiently share resources. AsyncIO is also perfect for I/O-bound tasks, especially when you need to run thousands of these concurrent tasks (requests) with minimal memory consumption and switching overhead.

We did not delve deeply into each topic because our goal was to provide a general overview of the main concepts of concurrent programming and try to bring everything together. Each topic is quite complex and extensive, requiring separate thoughtful and thorough study.

And to get a complete picture of the main concepts of concurrent programming in Python, we need to go a level higher and understand the fundamental differences between ASGI and WSGI and how sync/async web servers and frameworks work. I believe we'll cover this in the next article.

要查看或添加评论，请登录

Alexander Antonov的更多文章

A Tale of Web Application Optimization

2025年1月29日

A Tale of Web Application Optimization

Cloud providers allow for effortless scaling to significant sizes, which is often abused to compensate for issues with…
My job hunting experience. Thoughts and observations.

2024年8月22日

My job hunting experience. Thoughts and observations.

Over the past year, I have periodically participated in hiring processes as a candidate. I took part in over 30…
How to store refresh tokens securely in the database?

2024年7月26日

How to store refresh tokens securely in the database?

It looks pretty obvious, but from time to time I saw discussions on this topic and different approaches for storing…
When to use def and async def path operation functions in FastAPI

2024年7月17日

When to use def and async def path operation functions in FastAPI

FastAPI is a modern web framework for building high-performance APIs. One of its core strengths is its support for…
Concurrency with Frameworks and Servers for Python

2024年7月11日

Concurrency with Frameworks and Servers for Python

I would like to return to the topic of concurrent programming in Python today, but this time we're going to move a few…
Personal knowledge base as a tool for capturing learning results

2024年4月8日

Personal knowledge base as a tool for capturing learning results

I believe that anyone who actively engages in learning and self-improvement has noticed that after several months…
DDD. What should you read?

2023年11月12日

DDD. What should you read?

Hey! Working on large projects with complex business logic, we have to think about how to organize things so that…

See all articles

社区洞察

Software Development

What are the key differences between threading and async in Python?

Concurrent Programming in Python. Main Concepts

Alexander Antonov

Senior Software Engineer | 14+ years in IT | Python | PHP

Concurrency is not Parallelism

Preemptive and Cooperative Multitasking

What is the Event Loop?

What are the Coroutines?

What is the GIL?

Multiprocessing vs Multithreading vs Async

1. Multiprocessing

2. Multithreading

领英推荐

3. Async

Greenlets

Conclusion

Alexander Antonov的更多文章

社区洞察

其他会员也浏览了

What is Polymorphism in OOPs programming - NareshIT

Mastering Python Development: Top Tips for Efficient Coding

Exploring the Power of Mojo?? Programming Language

Python Programming: The Ultimate Beginner’s Guide

The Future of Python Development: A Look at 3.12.0

Benefits of Python

Functions in Python

Cracking Python development for professionals

10 Best Practices for Efficient Python Development in 2024

Python Open Source Tools

Concurrency is not Parallelism

Preemptive and Cooperative Multitasking

What is the Event Loop?

What are the Coroutines?

What is the GIL?

Multiprocessing vs Multithreading vs Async

1. Multiprocessing

2. Multithreading

领英推荐

3. Async

Greenlets

Conclusion

Alexander Antonov的更多文章

A Tale of Web Application Optimization

My job hunting experience. Thoughts and observations.

How to store refresh tokens securely in the database?

When to use def and async def path operation functions in FastAPI

Concurrency with Frameworks and Servers for Python

Personal knowledge base as a tool for capturing learning results

DDD. What should you read?

社区洞察

其他会员也浏览了

What is Polymorphism in OOPs programming - NareshIT

Mastering Python Development: Top Tips for Efficient Coding

Exploring the Power of Mojo?? Programming Language

Python Programming: The Ultimate Beginner’s Guide

The Future of Python Development: A Look at 3.12.0

Benefits of Python

Functions in Python

Cracking Python development for professionals

10 Best Practices for Efficient Python Development in 2024

Python Open Source Tools