Python Concurrency Demystified

16 Aug 2015

For problems that are suitable for improving performance using multiple processes/threads, Python does provide some convenient interfaces (like multithreading and multiprocessing) to handle them. However, the behavior of Python threads may not be what you have expected if you are new to Python and have multithreading experience of languages like C/C++.

The thing is, Python threads are not real OS threads; Python threads are synchronized by the infamous Global Interpreter Lock (GIL), meaning that any python thread need to acquire GIL before it can execute the code, which also means at any moment only one thread can own the lock and hence actually running, even though threads are dispatched onto multiple CPU cores.

This behavior of Python has significant implication: multithreaded Python programs don’t give you performance boost (especially in CPU-bound tasks), in fact, it could even give you worse performance than the same Python program in single-threaded version, because Python threads cannot be run concurrently, instead, they are serially run and interleaved, plus locking and context-switching adds even more overheads (locking and context-swtiching are expensive OS operations). To understand in details how GIL affects Python threads behavior, please read David Beazley’s research results, especially his presentation.

Having said that threading in Python is not helpful in improving program performance was not entirely fair. In some cases, it does help. In such cases, threads can help reduce blocking I/O waiting time by initiating each I/O operation almost immediately one after another and hence most of the I/O wating time for each thread is overlapped. In fact, that is usually how multi-threading in Python is used - doing I/O bound tasks.

To achieve real parallelism with Python, the salvation is using multiple processes. Each process has its own GIL and own memory, they are not shared between processes, therefore Python processes are totally unaffected by GIL and can be parallelized as expected. So if you want to parallelize CPU-bound tasks, using multi-processing is the natural and better way to go in Python. To cooridnate cooperations/communications among processes, it is recommended to use message passing instead of shared memory model, because it is easy to do wrong with shared memory. To pass messages among processes, a common pattern is to use a message queue, for which Python provides a Queue module to do just that. Of course, you have other choices of message queues/brokers implementation like 0mq, RQ, etc.

Here are examples of using Python multiprocessing to process data:

  1. Thumper
  2. 0mq vs python Queue
  3. Intro to Parallel Programming

In conclusion, processes are more heavy-weight than threads, however it is a more general solution to Python concurrency - it can be used for both CPU-bound and I/O-bound tasks. This is NOT to justify not using multithreading techniques in Python, instead, you will need to study WHEN and HOW to use multithreading. Multi-processing on the other hand, is easier to use but has higher memory footprints. For large-scale distributed web applications, you will need to distribute tasks to different processes on different machines anyway, threads can only live within a process on one machine.