Concurrent Programming in Python. Main Concepts
Concurrent programming is a fundamental aspect of software development that allows multiple tasks to be executed at the same time, but not necessarily simultaneously. It is a method to structure applications as composition of independent units that can be executed concurrently. This approach can significantly improve the efficiency and performance of applications, especially when we talk about multi-processor systems.
Core concepts are pretty similar to other languages, but Python has some specific features like GIL, AsyncIO, Greenlets and its own history of asynchronous programming development, so to keep things simple we will discuss them in relation to Python.
I'll put a summary table here to help structure all next information in the head:
Before we dive deep let's go over some definitions, which will help us better understand the essence of the subject matter.
Concurrency is not Parallelism
That is the title of a well-regarded talk by Rob Pike that clarifies the often-confused concepts of concurrency and parallelism in computing and how concurrency is implemented in Go language. This is essential and we must learn to distinguish between these principles.?
Concurrency - managing multiple tasks at the same time. It involves structuring a program to handle multiple tasks that can make progress independently, regardless of whether they are running simultaneously. In other words, concurrency is a way to structure programs to manage multiple tasks efficiently.?
Parallelism - executing multiple tasks simultaneously. It specifically refers to leveraging multiple processors or cores to run tasks at the same time. In other words, parallelism is a way to make programs run faster by executing multiple tasks simultaneously.
To better understand the idea, you can watch Rob Pike's presentation. It is available online and is quite interesting.
Preemptive and Cooperative Multitasking
Preemptive and cooperative multitasking are two different approaches to managing how multiple tasks (or threads) share CPU resources in a computing environment.?
In Preemptive Multitasking, the operating system (OS) is responsible for managing the execution of tasks. The OS can interrupt and suspend (or "preempt") a running task to give another task a turn to execute, ensuring that all tasks get a fair share of CPU time.
Key features:
In Cooperative Multitasking, tasks voluntarily yield control of the CPU to allow other tasks to run. A task continues to run until it explicitly gives up control, typically by calling a yield function or reaching a point where it can no longer proceed until some condition is met (e.g., waiting for I/O).
Key features:
What is the Event Loop?
An event loop is a programming construct that waits for and dispatches events or messages in a program. It is a fundamental component of asynchronous programming, enabling non-blocking I/O operations and efficient task management.
Key features:
What are the Coroutines?
Coroutines are a special type of function in programming that can be paused and resumed, allowing for asynchronous execution of code. They enable concurrent programming by allowing multiple tasks to be interleaved in a single thread without blocking it. This is particularly useful for I/O-bound and high-level structured network code.?
So, in other words, regular functions run to completion once called, whereas coroutines can yield control back to the caller, allowing other code to run, while, for example, it waits for something.
What is the GIL?
The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple native threads from executing Python code simultaneously. This means that in a multi-threaded Python program, even if you have multiple CPU cores, only one thread can execute Python code at a time.?
The GIL is a specific feature of the CPython (the reference implementation of Python) and is not common for other programming languages. It makes memory management and garbage collection easier to implement correctly and brings some performance benefits for single-threaded programs. For example, Jython or IronPython implementations of Python do not have a GIL and according to PEP 703 GIL is going to be optional in CPython for Python 3.13. So, things could change significantly in the near future.
Multiprocessing vs Multithreading vs Async
Now, let’s move on and tackle the main topic. Python offers several paradigms for handling concurrent execution: multiprocessing, multithreading and async. Each approach has its strengths and use cases, and understanding the differences between them can help you choose the right tool for your specific needs.?
1. Multiprocessing
Multiprocessing involves running multiple processes simultaneously, with each process having its own Python interpreter and memory space. This is particularly useful for CPU-bound tasks, where the main bottleneck is the CPU rather than I/O operations.
Key features:
Example using Python multiprocessing module:
from multiprocessing import Process, current_process
def worker(num):
print(f"Worker: {num}, PID: {current_process().pid}")
if __name__ == "__main__":
processes = []
for i in range(5):
p = Process(target=worker, args=(i,))
p.start()
processes.append(p)
for p in processes:
p.join()
As a more simple solution you can use ProcessPoolExecutor from the concurrent.futures. This module provides a high-level interface for asynchronously executing functions using threads or processes.
Benefits:
Example using concurrent.futures and ProcessPoolExecutor:
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import current_process
from threading import get_ident
import time
def worker(num):
time.sleep(3)
return f"Worker: {num}, PID: {current_process().pid}, Thread ID: {get_ident()}"
with ProcessPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(worker, i) for i in range(5)]
results = [future.result() for future in futures]
print(results)
2. Multithreading
Multithreading involves running multiple threads within the same process, sharing the same memory space. It is suitable for I/O-bound tasks, where the main bottleneck is waiting for I/O operations to complete.?
领英推荐
Main thing that we need to remember here for now is that threads cannot be executed in parallel because of the GIL (we are talking about CPython, of course), but threads can still be very useful for certain types of tasks that do not need to modify the interpreter's state. These are usually I/O-bound operations such as network requests and access to the file system. Such operations can release the GIL while waiting for a result, allowing other threads to start executing.?
Key features:
Example using Python threading module:
from threading import Thread, get_ident
def worker(num):
print(f"Worker: {num}, Thread ID: {get_ident()}")
threads = []
for i in range(5):
t = Thread(target=worker, args=(i,))
t.start()
threads.append(t)
for t in threads:
t.join()
And again you can use the concurrent.futures module, but now with ThreadPoolExecutor. Just swap ProcessPoolExecutor with ThreadPoolExecutor in the previous example and you're good to go.
Example using concurrent.futures and ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor
from multiprocessing import current_process
from threading import get_ident
import time
def worker(num):
time.sleep(3)
return f"Worker: {num}, PID: {current_process().pid}, Thread ID: {get_ident()}"
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(worker, i) for i in range(5)]
results = [future.result() for future in futures]
print(results)
3. Async
Asynchronous programming is a paradigm that allows a program to perform tasks concurrently, without waiting for each task to complete before starting the next one. This is particularly useful for I/O-bound operations, such as network requests, file I/O, or database queries, where waiting for a response can significantly delay the execution of the program.?
This method is the most lightweight and can be used for handling thousands of independent tasks (requests) concurrently. Switching of asynchronous tasks managed by code (usually with help of Event Loop) and not by the OS, which combined with lower memory consumption can be a more efficient and controllable solution for some types of tasks compared to the multiprocessing and multithreading.
Actually here we should talk about two different approaches to achieving async programming, each with its own mechanisms, advantages, and trade-offs. Let's start with the more popular one.
AsyncIO
AsyncIO is a library in Python that provides support for asynchronous programming. It allows you to write code using the async/await syntax, which can perform tasks concurrently without using multiple threads or processes.
AsyncIO uses an event loop to manage tasks asynchronously, allowing for non-blocking operations and efficient I/O handling. Tasks are defined as coroutines that can yield control back to the Event Loop, enabling other tasks to run concurrently.
Key Features:
Example:
import asyncio
import time
async def fetch_data_async():
await asyncio.sleep(2)
return "Data fetched"
async def main():
start = time.time()
result = await fetch_data_async()
end = time.time()
print(result)
print(f"Time taken: {end - start} seconds")
asyncio.run(main())
In this example, fetch_data_async() is an asynchronous function that simulates a delay using await asyncio.sleep(2). The main function awaits the result, demonstrating how asynchronous code can run concurrently without blocking the entire program.?
To execute asynchronous functions, you need an Event Loop, which manages the execution of coroutines. The async.run() function is commonly used to run the main coroutine.
One of the primary benefits of asynchronous programming is the ability to run multiple tasks concurrently. The AsyncIO module provides several ways to achieve this, including asyncio.gather() and asyncio.create_task().
Example with asyncio.gather():
import asyncio
import time
async def fetch_data_1():
await asyncio.sleep(1)
return "Data 1 fetched"
async def fetch_data_2():
await asyncio.sleep(2)
return "Data 2 fetched"
async def main():
start = time.time()
results = await asyncio.gather(fetch_data_1(), fetch_data_2())
end = time.time()
print(results)
print(f"Time taken: {end - start} seconds")
asyncio.run(main())
Here, fetch_data_1() and fetch_data_2() run concurrently, reducing the total wait time to 2 seconds instead of 3 seconds.
Example with asyncio.create_task:
import asyncio
import time
async def fetch_data_1():
await asyncio.sleep(1)
return "Data 1 fetched"
async def fetch_data_2():
await asyncio.sleep(2)
return "Data 2 fetched"
async def main():
start = time.time()
task1 = asyncio.create_task(fetch_data_1())
task2 = asyncio.create_task(fetch_data_2())
await task1
await task2
end = time.time()
print(task1.result(), task2.result())
print(f"Time taken: {end - start} seconds")
asyncio.run(main())
Greenlets
Greenlets are lightweight in-process coroutines provided by the greenlet library or through frameworks like gevent. They allow for concurrent programming by enabling cooperative multitasking with explicit yielding.
Key Features:
Example:
from greenlet import greenlet
def task1():
print("Task 1 start")
gr2.switch()
print("Task 1 end")
def task2():
print("Task 2 start")
gr1.switch()
print("Task 2 end")
gr1 = greenlet(task1)
gr2 = greenlet(task2)
gr1.switch()
Example with gevent, which abstracts greenlets with an event loop:
import gevent
from gevent import monkey; monkey.patch_all()
def task1():
print("Task 1 start")
gevent.sleep(1)
print("Task 1 end")
def task2():
print("Task 2 start")
gevent.sleep(1)
print("Task 2 end")
g1 = gevent.spawn(task1)
g2 = gevent.spawn(task2)
gevent.joinall([g1, g2])
As a conclusion for the Async section, we can conclude that both AsyncIO and Greenlets provide mechanisms for achieving concurrency in Python, but they are suited for different scenarios and have different trade-offs.?
AsyncIO is more explicit and integrated with the Python language, making it suitable for modern async applications.?
Greenlets offer a more implicit approach that can be easier to integrate with existing synchronous code but comes with potential complications related to monkey patching and context management.
Conclusion
So, as you can see there are several ways to achieve async programming in Python. Choosing between Multiprocessing, Multithreading, and AsyncIO depends on the nature of your task and your performance requirements. Multiprocessing is ideal for CPU-bound tasks requiring true parallelism, while multithreading is suitable for I/O-bound tasks where threads can efficiently share resources. AsyncIO is also perfect for I/O-bound tasks, especially when you need to run thousands of these concurrent tasks (requests) with minimal memory consumption and switching overhead.
We did not delve deeply into each topic because our goal was to provide a general overview of the main concepts of concurrent programming and try to bring everything together. Each topic is quite complex and extensive, requiring separate thoughtful and thorough study.
And to get a complete picture of the main concepts of concurrent programming in Python, we need to go a level higher and understand the fundamental differences between ASGI and WSGI and how sync/async web servers and frameworks work. I believe we'll cover this in the next article.