cyberangles blog

Python Multiprocessing: Why Using `Manager.list` Instead of a Real List Slows Down Calculations

Python’s multiprocessing module is a powerful tool for parallelizing CPU-bound tasks, allowing you to bypass the Global Interpreter Lock (GIL) by spawning separate processes. However, one common challenge in multiprocessing is sharing data between processes. Unlike threads, processes do not share memory space by default—each process has its own isolated memory. To work around this, Python provides multiprocessing.Manager, which lets you create shared objects like Manager.list, a proxy for a list that can be accessed by multiple processes.

But here’s the catch: while Manager.list solves the shared state problem, it often introduces significant performance overhead compared to using a "real" list (i.e., a list in a single process’s memory). In this blog, we’ll explore why Manager.list slows down calculations, unpack the technical reasons behind its overhead, and compare it to alternatives like local lists with message passing.

2026-02

Table of Contents#

  1. Understanding Multiprocessing and Shared State
  2. Real Lists in Multiprocessing: Not Truly Shared
  3. Enter Manager.list: A Shared List Proxy
  4. Why Manager.list Slows Down Calculations
  5. Performance Comparison: Manager.list vs. Real Lists
  6. When to Use Manager.list (and When Not To)
  7. Best Practices for Shared Data in Multiprocessing
  8. Conclusion
  9. References

1. Understanding Multiprocessing and Shared State#

To parallelize tasks, multiprocessing spawns separate processes, each with its own Python interpreter, memory space, and GIL. This isolation avoids GIL bottlenecks but makes sharing data between processes non-trivial. For example:

  • If you pass a list to a child process, the child gets a copy of the list (due to Python’s copy-on-write semantics), not the original. Changes to the copy in the child process won’t reflect in the parent.
  • To share data, you need mechanisms like shared memory, message passing (e.g., Queue/Pipe), or proxies (e.g., Manager.list).

2. Real Lists in Multiprocessing: Not Truly Shared#

A "real" list in Python lives in the memory space of the process that creates it. When you pass a real list to a child process, the child receives a copy (or a reference to a copy, thanks to copy-on-write). Modifications in the child process affect only its local copy, not the parent’s list.

Example: Real List in Multiprocessing#

import multiprocessing
 
def append_to_list(lst, value):
    lst.append(value)  # Modify the child's copy of the list
 
if __name__ == "__main__":
    real_list = []  # List in the parent process's memory
    p = multiprocessing.Process(target=append_to_list, args=(real_list, 42))
    p.start()
    p.join()
 
    print(real_list)  # Output: [] (no change—child modified its own copy)

Why? The child process gets a copy of real_list, so appending 42 only affects the child’s local version. The parent’s real_list remains empty. Real lists are fast but not shared across processes.

3. Enter Manager.list: A Shared List Proxy#

To share a list between processes, multiprocessing.Manager provides a solution. The Manager spawns a dedicated "manager server" process that hosts shared objects (like lists, dicts, or queues). Other processes access these objects via proxies—lightweight objects that forward method calls to the manager server.

Manager.list is such a proxy: it acts like a list, but every operation (e.g., append(), pop(), __getitem__) is routed to the manager server. The server executes the operation on the "real" list it hosts and returns the result to the calling process.

Example: Using Manager.list#

import multiprocessing
 
def append_to_shared_list(shared_lst, value):
    shared_lst.append(value)  # Proxy forwards append() to the manager server
 
if __name__ == "__main__":
    with multiprocessing.Manager() as manager:
        shared_list = manager.list()  # Proxy to a list in the manager server
        p = multiprocessing.Process(target=append_to_shared_list, args=(shared_list, 42))
        p.start()
        p.join()
 
        print(list(shared_list))  # Output: [42] (change reflected across processes)

Here, shared_list is a proxy. When append(42) is called, the child process sends a request to the manager server, which appends 42 to the actual list. The parent process then sees the update.

4. Why Manager.list Slows Down Calculations#

Manager.list solves the shared state problem, but its convenience comes with significant performance overhead. Let’s break down the key reasons:

4.1 Inter-Process Communication (IPC) Overhead#

Every operation on Manager.list (e.g., append, len, [i]) requires inter-process communication (IPC) between the calling process and the manager server. Unlike a real list—where operations are in-memory and nearly instantaneous—Manager.list operations involve:

  1. The calling process sending a message (e.g., "append(42)") to the manager server.
  2. The manager server receiving the message, executing the operation on the shared list.
  3. The server sending a response (e.g., "success" or the result of len()) back to the caller.

This round-trip IPC adds latency, especially for frequent operations (e.g., appending 10,000 elements in a loop).

4.2 Synchronization and Locking#

The manager server must ensure thread safety for shared objects. Even if your code doesn’t explicitly use locks, the manager internally uses locks to serialize access to Manager.list when multiple processes call methods concurrently. For example:

  • If two processes call shared_list.append(42) at the same time, the manager server will process one request first, then the other, to avoid race conditions.

This synchronization causes blocking: processes may wait idle while the manager server handles other requests, slowing down parallel execution.

4.3 Serialization Overhead#

Data passed between processes (and the manager server) must be serialized (converted to a byte stream) and deserialized (converted back to Python objects). Python uses the pickle module for this.

For Manager.list, even simple operations involve pickling:

  • When you call shared_list.append(42), the value 42 is pickled and sent to the manager server.
  • When you read shared_list[0], the manager server pickles the value and sends it back.

Pickling is fast for small data, but repeated serialization/deserialization (e.g., appending millions of elements) adds up. For large objects (e.g., lists of numpy arrays), this overhead becomes prohibitive.

5. Performance Comparison: Manager.list vs. Real Lists#

To quantify the overhead of Manager.list, let’s compare two scenarios:

  1. Using Manager.list: Multiple processes append to a shared Manager.list.
  2. Using Local Lists + Message Passing: Each process appends to a local list, then sends the local list to the parent via a Queue for aggregation.

Test Setup#

We’ll spawn 4 processes, each appending 10,000 elements to a list. We’ll measure the total time for both approaches.

Code: Manager.list vs. Local Lists with Queue#

import multiprocessing
import time
 
# Scenario 1: Append to Manager.list
def worker_manager(shared_list, num_elements):
    for i in range(num_elements):
        shared_list.append(i)  # Each append requires IPC to manager
 
# Scenario 2: Append to local list, then send to parent via Queue
def worker_local(queue, num_elements):
    local_list = []
    for i in range(num_elements):
        local_list.append(i)  # No IPC—local to the process
    queue.put(local_list)  # Send once at the end
 
if __name__ == "__main__":
    num_processes = 4
    elements_per_process = 10000
 
    # Test 1: Manager.list
    start = time.time()
    with multiprocessing.Manager() as manager:
        shared_list = manager.list()
        processes = [
            multiprocessing.Process(
                target=worker_manager,
                args=(shared_list, elements_per_process)
            ) for _ in range(num_processes)
        ]
        for p in processes:
            p.start()
        for p in processes:
            p.join()
        total_manager = list(shared_list)  # Convert proxy to real list
    manager_time = time.time() - start
 
    # Test 2: Local lists + Queue
    start = time.time()
    queue = multiprocessing.Queue()
    processes = [
        multiprocessing.Process(
            target=worker_local,
            args=(queue, elements_per_process)
        ) for _ in range(num_processes)
    ]
    for p in processes:
        p.start()
    total_local = []
    for _ in range(num_processes):
        total_local.extend(queue.get())  # Aggregate results in parent
    for p in processes:
        p.join()
    queue_time = time.time() - start
 
    # Verify correctness and print results
    assert len(total_manager) == len(total_local) == num_processes * elements_per_process
    print(f"Manager.list time: {manager_time:.4f}s")
    print(f"Local lists + Queue time: {queue_time:.4f}s")

Results#

On a typical machine, you’ll see output like:

Manager.list time: 2.1452s  
Local lists + Queue time: 0.0123s  

Why the huge difference?

  • Manager.list incurs IPC, locking, and pickling overhead for every append (40,000 operations total).
  • Local lists with Queue avoid per-operation overhead: appends are in-memory (fast), and data is sent to the parent only once per process (4 total put() calls).

6. When to Use Manager.list (and When Not To)#

Manager.list is not "bad"—it’s a tool for specific use cases. Use it when:

  • You need random access to a shared list (e.g., reading/writing arbitrary indices from multiple processes).
  • The number of operations on the list is small (e.g., occasional appends/updates).
  • Simplicity matters more than raw speed (e.g., prototyping).

Avoid Manager.list when:

  • You’re doing frequent operations (e.g., appending millions of elements).
  • You only need to aggregate results from processes (use local lists + Queue instead).
  • You’re working with large data (serialization overhead becomes crippling).

7. Best Practices for Shared Data in Multiprocessing#

To minimize overhead, follow these guidelines:

1. Avoid Shared State Altogether#

The fastest way to share data is to not share it. Design workflows where processes work on independent chunks of data and return results to the parent via Queue or Pipe.

2. Use Shared Memory for Large Homogeneous Data#

For large arrays (e.g., numpy arrays), use multiprocessing.Array or shared_memory (Python 3.8+). These store data in shared memory (no manager server) and avoid pickling.

Example with shared_memory:

from multiprocessing import Process
from multiprocessing.shared_memory import SharedMemory
import numpy as np
 
def modify_shared_array(shm_name, shape, dtype):
    shm = SharedMemory(name=shm_name)
    arr = np.ndarray(shape, dtype=dtype, buffer=shm.buf)
    arr[:] = np.random.rand(*shape)  # Modify shared array directly
    shm.close()
 
if __name__ == "__main__":
    shape = (1000, 1000)
    dtype = np.float64
    shm = SharedMemory(create=True, size=np.prod(shape) * np.dtype(dtype).itemsize)
    arr = np.ndarray(shape, dtype=dtype, buffer=shm.buf)
    p = Process(target=modify_shared_array, args=(shm.name, shape, dtype))
    p.start()
    p.join()
    print(arr.sum())  # Access modified array in parent
    shm.close()
    shm.unlink()

3. Use Queue for Result Aggregation#

For collecting results from processes, Queue is faster than Manager.list. As shown earlier, appending to a local list and sending it once via Queue avoids per-operation IPC.

8. Conclusion#

Manager.list is a convenient proxy for sharing lists between processes, but its performance suffers from inter-process communication, synchronization, and serialization overhead. For frequent operations or large datasets, it is far slower than using local lists with message passing (e.g., Queue).

The key takeaway: shared state in multiprocessing is expensive. Whenever possible, design your workflows to avoid shared data. When you must share data, use tools like shared_memory for large arrays or Queue for result aggregation instead of Manager.list.

9. References#