Concurrent Programming in Python. Main Concepts

Concurrent Programming in Python. Main Concepts

Concurrent programming is a fundamental aspect of software development that allows multiple tasks to be executed at the same time, but not necessarily simultaneously. It is a method to structure applications as composition of independent units that can be executed concurrently. This approach can significantly improve the efficiency and performance of applications, especially when we talk about multi-processor systems.

Core concepts are pretty similar to other languages, but Python has some specific features like GIL, AsyncIO, Greenlets and its own history of asynchronous programming development, so to keep things simple we will discuss them in relation to Python.

I'll put a summary table here to help structure all next information in the head:

Before we dive deep let's go over some definitions, which will help us better understand the essence of the subject matter.

Concurrency is not Parallelism

That is the title of a well-regarded talk by Rob Pike that clarifies the often-confused concepts of concurrency and parallelism in computing and how concurrency is implemented in Go language. This is essential and we must learn to distinguish between these principles.?

Concurrency - managing multiple tasks at the same time. It involves structuring a program to handle multiple tasks that can make progress independently, regardless of whether they are running simultaneously. In other words, concurrency is a way to structure programs to manage multiple tasks efficiently.?

Parallelism - executing multiple tasks simultaneously. It specifically refers to leveraging multiple processors or cores to run tasks at the same time. In other words, parallelism is a way to make programs run faster by executing multiple tasks simultaneously.

To better understand the idea, you can watch Rob Pike's presentation. It is available online and is quite interesting.

Preemptive and Cooperative Multitasking

Preemptive and cooperative multitasking are two different approaches to managing how multiple tasks (or threads) share CPU resources in a computing environment.?

In Preemptive Multitasking, the operating system (OS) is responsible for managing the execution of tasks. The OS can interrupt and suspend (or "preempt") a running task to give another task a turn to execute, ensuring that all tasks get a fair share of CPU time.

Key features:

  • OS-Controlled. The OS decides when a task is interrupted and when another task is given CPU time.
  • Time Slicing. The OS uses time slices to ensure tasks switch frequently enough to give the illusion of simultaneous execution.
  • Responsiveness. Preemptive multitasking generally provides better system responsiveness, as tasks that are ready to run can be scheduled promptly.
  • Fairness. Since the OS controls task switching, it can ensure fair distribution of CPU time among tasks, preventing any single task from monopolizing the CPU.

In Cooperative Multitasking, tasks voluntarily yield control of the CPU to allow other tasks to run. A task continues to run until it explicitly gives up control, typically by calling a yield function or reaching a point where it can no longer proceed until some condition is met (e.g., waiting for I/O).

Key features:

  • Task-Controlled. The running tasks themselves decide when to yield the CPU.
  • Potential for Unresponsiveness. If a task does not yield control appropriately, it can monopolize the CPU, leading to unresponsive systems.
  • Less Overhead. Since context switches happen less frequently and at known points, there is generally less overhead compared to preemptive multitasking.
  • Dependence on Well-Behaved Tasks. The system relies on tasks being well-behaved and regularly yielding control to ensure all tasks get a chance to run.

What is the Event Loop?

An event loop is a programming construct that waits for and dispatches events or messages in a program. It is a fundamental component of asynchronous programming, enabling non-blocking I/O operations and efficient task management.

Key features:

  • Task Scheduling. The event loop schedules and manages multiple tasks, such as I/O operations, coroutines, and callbacks. It ensures that these tasks are executed in an orderly manner without blocking the main program flow.
  • Non-blocking I/O. It allows the program to perform I/O operations (like reading from a file or making a network request) without waiting for each operation to complete and blocking the main thread.
  • Concurrency. The event loop enables concurrency, meaning multiple tasks can be in progress at the same time, sharing the same thread of execution. This is achieved through cooperative multitasking, where tasks voluntarily yield control back to the event loop, allowing other tasks to run.
  • Coroutine Management. In many modern programming languages, the event loop manages coroutines, which are functions that can be paused and resumed. Coroutines are used to write asynchronous code in a sequential style, making it easier to read and maintain.

What are the Coroutines?

Coroutines are a special type of function in programming that can be paused and resumed, allowing for asynchronous execution of code. They enable concurrent programming by allowing multiple tasks to be interleaved in a single thread without blocking it. This is particularly useful for I/O-bound and high-level structured network code.?

So, in other words, regular functions run to completion once called, whereas coroutines can yield control back to the caller, allowing other code to run, while, for example, it waits for something.

What is the GIL?

The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple native threads from executing Python code simultaneously. This means that in a multi-threaded Python program, even if you have multiple CPU cores, only one thread can execute Python code at a time.?

The GIL is a specific feature of the CPython (the reference implementation of Python) and is not common for other programming languages. It makes memory management and garbage collection easier to implement correctly and brings some performance benefits for single-threaded programs. For example, Jython or IronPython implementations of Python do not have a GIL and according to PEP 703 GIL is going to be optional in CPython for Python 3.13. So, things could change significantly in the near future.

Multiprocessing vs Multithreading vs Async

Now, let’s move on and tackle the main topic. Python offers several paradigms for handling concurrent execution: multiprocessing, multithreading and async. Each approach has its strengths and use cases, and understanding the differences between them can help you choose the right tool for your specific needs.?

1. Multiprocessing

Multiprocessing involves running multiple processes simultaneously, with each process having its own Python interpreter and memory space. This is particularly useful for CPU-bound tasks, where the main bottleneck is the CPU rather than I/O operations.

Key features:

  • Isolation. Each process runs independently with its own memory space, which eliminates the Global Interpreter Lock (GIL) issues, but has higher memory consumption.
  • Parallelism. True parallelism is achieved by utilizing multiple CPU cores.
  • Robustness. Crashes in one process do not affect other processes.

Example using Python multiprocessing module:

from multiprocessing import Process, current_process


def worker(num):
   print(f"Worker: {num}, PID: {current_process().pid}")


if __name__ == "__main__":
   processes = []
   for i in range(5):
       p = Process(target=worker, args=(i,))
       p.start()
       processes.append(p)

   for p in processes:
       p.join()        

As a more simple solution you can use ProcessPoolExecutor from the concurrent.futures. This module provides a high-level interface for asynchronously executing functions using threads or processes.

Benefits:

  • Simplified API. Easier to use compared to directly managing threads or processes.
  • Resource Management. Automatically manages the lifecycle of threads/processes, reducing boilerplate code.
  • Scalability. Easy to switch between threads and processes by changing the executor.

Example using concurrent.futures and ProcessPoolExecutor:

from concurrent.futures import ProcessPoolExecutor
from multiprocessing import current_process
from threading import get_ident
import time


def worker(num):
   time.sleep(3)
   return f"Worker: {num}, PID: {current_process().pid}, Thread ID: {get_ident()}"


with ProcessPoolExecutor(max_workers=5) as executor:
   futures = [executor.submit(worker, i) for i in range(5)]
   results = [future.result() for future in futures]

print(results)        

2. Multithreading

Multithreading involves running multiple threads within the same process, sharing the same memory space. It is suitable for I/O-bound tasks, where the main bottleneck is waiting for I/O operations to complete.?

Main thing that we need to remember here for now is that threads cannot be executed in parallel because of the GIL (we are talking about CPython, of course), but threads can still be very useful for certain types of tasks that do not need to modify the interpreter's state. These are usually I/O-bound operations such as network requests and access to the file system. Such operations can release the GIL while waiting for a result, allowing other threads to start executing.?

Key features:

  • Shared Memory. Threads share the same memory space, which makes communication between threads faster.
  • GIL Constraint. Due to Python's Global Interpreter Lock (GIL), only one thread can execute Python code at a time, limiting true parallelism.
  • Lightweight. Threads are lighter and consume fewer resources compared to processes.

Example using Python threading module:

from threading import Thread, get_ident


def worker(num):
   print(f"Worker: {num}, Thread ID: {get_ident()}")


threads = []
for i in range(5):
   t = Thread(target=worker, args=(i,))
   t.start()
   threads.append(t)

for t in threads:
   t.join()        

And again you can use the concurrent.futures module, but now with ThreadPoolExecutor. Just swap ProcessPoolExecutor with ThreadPoolExecutor in the previous example and you're good to go.

Example using concurrent.futures and ThreadPoolExecutor:

from concurrent.futures import ThreadPoolExecutor
from multiprocessing import current_process
from threading import get_ident
import time


def worker(num):
   time.sleep(3)
   return f"Worker: {num}, PID: {current_process().pid}, Thread ID: {get_ident()}"


with ThreadPoolExecutor(max_workers=5) as executor:
   futures = [executor.submit(worker, i) for i in range(5)]
   results = [future.result() for future in futures]

print(results)        

3. Async

Asynchronous programming is a paradigm that allows a program to perform tasks concurrently, without waiting for each task to complete before starting the next one. This is particularly useful for I/O-bound operations, such as network requests, file I/O, or database queries, where waiting for a response can significantly delay the execution of the program.?

This method is the most lightweight and can be used for handling thousands of independent tasks (requests) concurrently. Switching of asynchronous tasks managed by code (usually with help of Event Loop) and not by the OS, which combined with lower memory consumption can be a more efficient and controllable solution for some types of tasks compared to the multiprocessing and multithreading.

Actually here we should talk about two different approaches to achieving async programming, each with its own mechanisms, advantages, and trade-offs. Let's start with the more popular one.

AsyncIO

AsyncIO is a library in Python that provides support for asynchronous programming. It allows you to write code using the async/await syntax, which can perform tasks concurrently without using multiple threads or processes.

AsyncIO uses an event loop to manage tasks asynchronously, allowing for non-blocking operations and efficient I/O handling. Tasks are defined as coroutines that can yield control back to the Event Loop, enabling other tasks to run concurrently.

Key Features:

  • Event Loop. Central to AsyncIO is the event loop, which manages and schedules asynchronous tasks.
  • Coroutines. Functions defined with async def and designed to be paused and resumed. They use await to yield control back to the event loop, allowing other tasks to run.
  • Non-blocking. Uses non-blocking I/O to improve efficiency.
  • Single-threaded. Runs on a single thread, avoiding issues related to GIL.
  • Cooperative multitasking. A running task must explicitly yield control.
  • Performance. Efficient for I/O-bound and high-level structured network code. Avoids the overhead of thread context switching and the Global Interpreter Lock (GIL) contention.
  • Explicit syntax. Uses async and await keywords, making the asynchronous flow clear in the code.
  • Integration. You have to use specific libraries that support AsyncIO, such as aiohttp for HTTP, aiomysql for MySQL, etc.

Example:

import asyncio
import time


async def fetch_data_async():
   await asyncio.sleep(2)
   return "Data fetched"


async def main():
   start = time.time()
   result = await fetch_data_async()
   end = time.time()

   print(result)
   print(f"Time taken: {end - start} seconds")


asyncio.run(main())        

In this example, fetch_data_async() is an asynchronous function that simulates a delay using await asyncio.sleep(2). The main function awaits the result, demonstrating how asynchronous code can run concurrently without blocking the entire program.?

To execute asynchronous functions, you need an Event Loop, which manages the execution of coroutines. The async.run() function is commonly used to run the main coroutine.

One of the primary benefits of asynchronous programming is the ability to run multiple tasks concurrently. The AsyncIO module provides several ways to achieve this, including asyncio.gather() and asyncio.create_task().

Example with asyncio.gather():

import asyncio
import time


async def fetch_data_1():
   await asyncio.sleep(1)
   return "Data 1 fetched"


async def fetch_data_2():
   await asyncio.sleep(2)
   return "Data 2 fetched"


async def main():
   start = time.time()
   results = await asyncio.gather(fetch_data_1(), fetch_data_2())
   end = time.time()

   print(results)
   print(f"Time taken: {end - start} seconds")


asyncio.run(main())        

Here, fetch_data_1() and fetch_data_2() run concurrently, reducing the total wait time to 2 seconds instead of 3 seconds.

Example with asyncio.create_task:

import asyncio
import time


async def fetch_data_1():
   await asyncio.sleep(1)
   return "Data 1 fetched"


async def fetch_data_2():
   await asyncio.sleep(2)
   return "Data 2 fetched"


async def main():
   start = time.time()
   task1 = asyncio.create_task(fetch_data_1())
   task2 = asyncio.create_task(fetch_data_2())

   await task1
   await task2
   end = time.time()

   print(task1.result(), task2.result())
   print(f"Time taken: {end - start} seconds")


asyncio.run(main())        

Greenlets

Greenlets are lightweight in-process coroutines provided by the greenlet library or through frameworks like gevent. They allow for concurrent programming by enabling cooperative multitasking with explicit yielding.

Key Features:

  • Micro-Threads. Greenlets are small independent pseudo-thread managed within a single native thread created and managed explicitly by the programmer.
  • Switching. Switches between greenlets are explicit but do not require the async and await syntax.
  • Concurrency Model. Also uses cooperative multitasking, but often relies on monkey patching standard libraries to make them non-blocking.
  • Performance. Efficient for I/O-bound tasks and concurrency but requires careful management to avoid blocking the main native thread.
  • No special syntax. Greenlets switch context explicitly, often by gevent.sleep() or greenlet.switch().
  • Integration. Requires monkey patching to work with existing synchronous code, which can lead to subtle bugs, when monkey-patched libraries don’t work as expected.

Example:

from greenlet import greenlet


def task1():
   print("Task 1 start")
   gr2.switch()
   print("Task 1 end")


def task2():
   print("Task 2 start")
   gr1.switch()
   print("Task 2 end")


gr1 = greenlet(task1)
gr2 = greenlet(task2)

gr1.switch()        

Example with gevent, which abstracts greenlets with an event loop:

import gevent
from gevent import monkey; monkey.patch_all()


def task1():
   print("Task 1 start")
   gevent.sleep(1)
   print("Task 1 end")


def task2():
   print("Task 2 start")
   gevent.sleep(1)
   print("Task 2 end")


g1 = gevent.spawn(task1)
g2 = gevent.spawn(task2)

gevent.joinall([g1, g2])        

As a conclusion for the Async section, we can conclude that both AsyncIO and Greenlets provide mechanisms for achieving concurrency in Python, but they are suited for different scenarios and have different trade-offs.?

AsyncIO is more explicit and integrated with the Python language, making it suitable for modern async applications.?

Greenlets offer a more implicit approach that can be easier to integrate with existing synchronous code but comes with potential complications related to monkey patching and context management.

Conclusion

So, as you can see there are several ways to achieve async programming in Python. Choosing between Multiprocessing, Multithreading, and AsyncIO depends on the nature of your task and your performance requirements. Multiprocessing is ideal for CPU-bound tasks requiring true parallelism, while multithreading is suitable for I/O-bound tasks where threads can efficiently share resources. AsyncIO is also perfect for I/O-bound tasks, especially when you need to run thousands of these concurrent tasks (requests) with minimal memory consumption and switching overhead.

We did not delve deeply into each topic because our goal was to provide a general overview of the main concepts of concurrent programming and try to bring everything together. Each topic is quite complex and extensive, requiring separate thoughtful and thorough study.

And to get a complete picture of the main concepts of concurrent programming in Python, we need to go a level higher and understand the fundamental differences between ASGI and WSGI and how sync/async web servers and frameworks work. I believe we'll cover this in the next article.

要查看或添加评论,请登录

Alexander Antonov的更多文章

社区洞察

其他会员也浏览了