Async/Await is a Plague: Part 1 Roots
  • Python
  • Concurrency

Async/Await is a Plague: Part 1 Roots

Part 1: The Roots

Preface: The Silent Contagion in Our Code

This is the first installment in a series of articles exploring a fundamental flaw in modern software architecture: the async-await pattern.

For years, we’ve been told that async-await is a triumph of language design: a clean, elegant syntax sugar that gives us high-performance, non-blocking I/O without the headache of nested callbacks. But look closer at any mature codebase, and you’ll find a different story. async-await acts like a biological pathogen. The moment you introduce it to a single lower-level utility, it forces a cascade of modifications all the way up your call stack. Over this series I’ll argue that it partitions your language into two incompatible worlds, erodes composability, and quietly infects everything it touches.

Over the course of this series, we are going to dissect this plague. We will examine how it ruins clean architecture, why it forces developers into a "two-colored function" trap, and what alternatives exist to cure our codebases.

But to understand why this pattern is so destructive, we first have to understand why we were desperate enough to invite it into our standard libraries in the first place. We have to look at the roots.


The Concurrency Crisis: Threads are Heavy

To understand how the plague of async-await spread so thoroughly, we first have to understand the environment that bred it. It wasn’t born out of a desire for elegant abstractions; it was born out of economic and hardware desperation.

Go back a decade or two, and backend concurrency was straightforward: one thread per request. When a user hit your server, the operating system spawned or assigned a thread. If that request needed to fetch data from a database, the thread sat there and waited.

Cognitively, this was beautiful. Your code ran sequentially. You could read a stack trace from top to bottom, and it mapped perfectly to the human brain's linear execution model.

But operationally, it was a disaster for high-throughput web applications.

Threads are an operating system abstraction, and they don't come cheap. Each thread reserves a non-trivial chunk of address space for its stack (the default cap is often 1MB to 8MB, even if only a fraction is ever resident) and, worse, switching between threads requires a context switch. When you have tens of thousands of concurrent users, the CPU spends more time playing traffic cop, saving thread states, flushing CPU caches, and swapping memory contexts, than actually running your application logic.

But the memory and context-switch overhead is only half the story, and arguably not even the damning half. The deeper problem is a mismatch of tools to workload. A thread is a preemptive, general-purpose execution context: it carries a full call stack, register state, and a scheduler slot, all of which exist so the OS can run CPU work and forcibly interrupt it. That machinery is exactly what you want for a video encoder or a physics simulation that actually saturates a core. It is wildly disproportionate for code whose entire job is to wait.

And waiting is precisely what request-handling code does. Consider a typical endpoint:

def handle_request(request: Request) -> Response:
    user = db.query(user_id)        # ~5 ms blocked on the database
    profile = cache.get(user.key)   # ~1 ms blocked on the network
    resp = http.get(profile.avatar) # ~50 ms blocked on a remote API
    return render(user, profile, resp)

Add up the numbers: this handler is blocked on I/O for roughly 56 milliseconds and spends well under a millisecond actually executing Python. For more than 98% of its lifetime, the thread is parked, holding its multi-megabyte stack, occupying a slot in the OS scheduler, counting against the kernel's thread limit, and doing nothing but waiting for a socket to become readable. You have committed a heavyweight, preemptible CPU vehicle to a job that is almost entirely idle.

Now multiply that by ten thousand concurrent requests. You don't need ten thousand things running; at any instant only a handful are actually on a CPU. You need ten thousand things waiting, cheaply. Yet the one-thread-per-request model forces you to allocate ten thousand full execution contexts (gigabytes of stack reservation and a scheduler the OS was never designed to push that far) to represent what is, fundamentally, ten thousand entries in a list of "wake me when my data arrives." The thread is overkill not because threads are bad, but because you've spent a tool built for computation on a problem that is pure coordination.

Hardware had hit a wall, but web traffic hadn’t. We needed a way to handle massive I/O without the crushing overhead of OS threads.

It's worth pausing here, because a skeptic will already be objecting: why not just make threads cheaper? That is exactly what goroutines, and more recently Java's virtual threads (Project Loom), set out to do: millions of lightweight threads, scheduled in user space, with ordinary blocking code. Hold that thought. The existence of those models is the strongest argument that async-await was a choice, not a necessity, and we'll put it on trial later in the series. For now, it's enough to understand the desperation that made async-await feel inevitable at the time.

Moving Scheduling Up the Stack: The Event Loop and Asynchronous I/O

The solution was to bypass OS threads for concurrency entirely, moving the scheduling into the application layer via Non-blocking I/O and an Event Loop.

Instead of a thousand threads waiting on a thousand slow database queries, you have exactly one thread (or one per CPU core) running a continuous loop. When an I/O operation happens, the engine registers a callback, fires off the request, and immediately moves on to the next task in line. When the database finally responds, the event loop picks up the callback and runs it.

(This single-threaded picture is the classic Node.js model. Other runtimes (Rust's Tokio, C#'s task scheduler) run async tasks across a multi-threaded work-stealing pool. The key idea is the same in both: tasks suspend and resume cooperatively rather than blocking an OS thread.)

Suddenly, a single machine could handle hundreds of thousands of concurrent connections. It was an architectural miracle for raw performance. But it left one very practical question wide open: how do you actually write a task for this loop?

The loop needs each task to run for a little while, voluntarily pause when it hits something slow, and let itself be resumed later, exactly the "do a small chunk of work, then step aside" behaviour we sketched above. The first answer the industry reached for was the callback: hand the loop a function to call "when the data is ready." We've already seen where that road leads: your linear logic shredded into a pyramid of nested closures, with loops and try-catch useless across the boundaries.

But there's a second answer, and Python ships it right in the language. It turns out Python already has a construct whose entire purpose is to pause a function in place, preserve all of its local state, and resume it later exactly where it left off. It's called the generator, and once you see it clearly, you can build a working event loop out of it in an afternoon. Let's do exactly that.

The Pause Button: Python Generators

Most people first meet generators as a memory-efficient way to produce sequences lazily. That framing, while true, completely undersells them. The thing that matters for us is not that a generator yields values; it's that a generator can suspend itself.

Look at what yield actually does:

from collections.abc import Generator

def countdown(n: int) -> Generator[int, None, None]:
    print("starting")
    while n > 0:
        yield n        # pause here, hand control (and a value) back to the caller
        n -= 1         # ...resume *here* the next time we're stepped
    print("done")

Side note: reading the Generator type hint. A generator is described by Generator[YieldType, SendType, ReturnType]: the first parameter is the type of the values it hands out with yield, the second is the type you can push back into it with .send() (we won't use here, so it stays None), and the third is the type it produces with a final return. So Generator[int, None, None] reads as "yields ints, accepts nothing sent in, returns nothing." Two of those slots come alive later: the first, when our tasks start yielding I/O requests instead of plain values, and the third, when one task finishes by returning a result that another task waits on. Hold on to both.

Calling countdown(3) does not run a single line of the body. It hands you back a paused computation, an object sitting frozen at the very top, waiting. You drive it forward one step at a time with next():

>>> c = countdown(3)
>>> next(c)
starting
3
>>> next(c)      # resumes on the line *after* the yield, with n still == 3
2
>>> next(c)
1
>>> next(c)
done            # falls off the end -> raises StopIteration

Sit with what just happened. Between two calls to next(), the function was frozen mid-execution. Its local variable n, its position inside the while loop, all of it was preserved, in place, and then picked up again on demand. A generator object is a resumable stack frame you can hold in a variable.

That should sound familiar. A few sections ago we said that ten thousand waiting requests are really just "ten thousand entries in a list of wake me when my data arrives." A frozen generator is precisely such an entry: a suspended piece of work, parked cheaply on the heap, costing a small object instead of a multi-megabyte OS thread. The generator is the lightweight "waiting thing" we were looking for all along.

A Generator Is a Task

If yield can pause a function, then it can serve as a voluntary suspension point, a place where a task says, "I've done a chunk of work; pause me and let someone else run." Write two such tasks:

from collections.abc import Generator

def task(name: str, steps: int) -> Generator[None, None, None]:
    for i in range(steps):
        print(f"{name}: step {i}")
        yield                       # "I'll pause here, run something else"

Now we can interleave them by hand, simply by choosing whose turn it is to be resumed:

>>> a = task("A", 2)
>>> b = task("B", 2)
>>> next(a); next(b); next(a); next(b)
A: step 0
B: step 0
A: step 1
B: step 1

Two independent computations, advancing in lockstep, on a single thread, with no OS involvement whatsoever. This is cooperative multitasking: unlike preemptive OS threads, which the kernel can interrupt at any instant, these tasks are never interrupted. They run until they choose to yield. Cooperation is the whole bargain: as long as every task hands control back promptly, a single thread can juggle thousands of them.

Of course, advancing tasks by hand isn't a system. We need something to hold the tasks and decide whose turn is next. We need a scheduler.

A Scheduler in a Dozen Lines

A scheduler is almost embarrassingly simple: keep a queue of tasks, pop one, run it until its next yield, and, if it isn't finished, put it back at the end of the line. Round and round.

from collections import deque
from collections.abc import Generator

class Scheduler:
    def __init__(self) -> None:
        self.ready: deque[Generator[None, None, None]] = deque()

    def add(self, task: Generator[None, None, None]) -> None:
        self.ready.append(task)

    def run(self) -> None:
        while self.ready:
            task = self.ready.popleft()
            try:
                next(task)              # run until the next yield
            except StopIteration:
                continue                # task finished, drop it
            self.ready.append(task)     # not done, requeue it
>>> s = Scheduler()
>>> s.add(task("A", 3))
>>> s.add(task("B", 2))
>>> s.run()
A: step 0
B: step 0
A: step 1
B: step 1
A: step 2

That's a complete cooperative scheduler. It runs many tasks concurrently on one thread, and the only "magic" is next() and a queue. But it's still missing the one thing the whole article has been building toward: these tasks only ever pause to be immediately rescheduled. None of them knows how to wait for the outside world.

Teaching Tasks to Wait: the Event Loop

Here is the missing piece. When a task hits a slow socket, we don't want the scheduler to busy-loop, re-running it over and over to ask "ready yet? ready yet?" We want the task to say what it's waiting for and then stand aside completely until that thing is actually ready.

So we make yield carry a message. Instead of yielding nothing, a task yields a small request describing its blocking need:

yield ("read", sock)     # "park me until this socket is readable"
yield ("write", sock)    # "park me until this socket is writable"

The scheduler now does two jobs. Tasks that are runnable sit in the ready queue as before. Tasks that are waiting on I/O get parked in a side table, keyed by the socket they're blocked on. And when the ready queue empties, the loop hands that whole side table to the operating system's select call, a single syscall that blocks until any of those sockets becomes ready, then tells us which ones. Those tasks get moved back to the ready queue, and the wheel turns again.

import select
import socket
from collections import deque
from collections.abc import Generator
from typing import Any

# A task is a generator that yields ("read"|"write", socket) requests and
# eventually *returns* a result of type R for another task to await. The yield
# slot is fixed (every task speaks the same protocol to the loop); the return
# slot is generic, so read_message becomes Task[bytes], echo becomes Task[None].
type Task[R] = Generator[tuple[str, socket.socket], None, R]

class Loop:
    def __init__(self) -> None:
        # The loop drives tasks but ignores whatever they return, so it stores
        # them as Task[Any]; individual tasks pin R down to their real result.
        self.ready: deque[Task[Any]] = deque()
        self.read_waiting: dict[socket.socket, Task[Any]] = {}   # socket -> task waiting to read
        self.write_waiting: dict[socket.socket, Task[Any]] = {}  # socket -> task waiting to write

    def add_task(self, task: Task[Any]) -> None:
        self.ready.append(task)

    def run(self) -> None:
        while self.ready or self.read_waiting or self.write_waiting:
            if not self.ready:
                # Nothing runnable: ask the OS to wake us when I/O is ready.
                readers, writers, _ = select.select(
                    self.read_waiting, self.write_waiting, []
                )
                for sock in readers:
                    self.ready.append(self.read_waiting.pop(sock))
                for sock in writers:
                    self.ready.append(self.write_waiting.pop(sock))

            task = self.ready.popleft()
            try:
                op = next(task)             # run until the next yield
            except StopIteration:
                continue

            kind, sock = op                 # the task told us what it needs
            if kind == "read":
                self.read_waiting[sock] = task
            elif kind == "write":
                self.write_waiting[sock] = task

That select.select call is the heart of every event loop ever written. (Real runtimes reach for epoll or kqueue instead, which scale to far more sockets, but the idea is identical.) One thread, one blocking syscall, and the kernel itself tells you exactly which of your thousands of parked tasks deserves to wake up.

Now we can write a genuinely concurrent network server as ordinary, linear, top-to-bottom code, with no callbacks and no pyramid:

import socket

def echo(sock: socket.socket) -> Task[None]:
    while True:
        yield ("read", sock)            # wait for the client to send something
        data = sock.recv(1024)
        if not data:                    # client hung up
            sock.close()
            return
        yield ("write", sock)           # wait until we can send
        sock.sendall(data)

def server(loop: Loop, host: str, port: int) -> Task[None]:
    listen = socket.socket()
    listen.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    listen.bind((host, port))
    listen.listen()
    listen.setblocking(False)
    while True:
        yield ("read", listen)          # wait for a new connection
        client, _ = listen.accept()
        client.setblocking(False)
        loop.add_task(echo(client))     # spawn a task to handle this client
loop = Loop()
loop.add_task(server(loop, "127.0.0.1", 8000))
loop.run()

Run that, point a few telnet localhost 8000 sessions at it, and they will all be served at once, concurrently, by a single thread that is never blocked and never spawns a thread per client. We have, in well under a hundred lines, rebuilt the core of Node.js, Twisted, and asyncio. Each echo task reads as straight-line imperative code, yet thousands of them coexist on one core, each one nothing more than a frozen generator parked in a dictionary, waiting for its socket.

Awaiting Another Task: yield from

So far each task has been a single flat generator. But real code is built from layers: a request handler calls a function that reads a full message, which calls a function that reads some bytes. The moment any of those lower layers needs to touch a socket, it has to suspend, and a bare yield can only suspend the innermost generator. How does a suspension deep in a helper reach all the way up to our Loop?

The answer is a second form of the keyword: yield from. Watch us factor the "read until you have a whole message" logic out of echo into its own task:

def read_message(sock: socket.socket) -> Task[bytes]:
    buffer = b""
    while b"END" not in buffer:         # read until the client sends the terminator
        yield ("read", sock)            # suspend to the loop, exactly as before
        chunk = sock.recv(1024)
        if not chunk:                   # client hung up mid-message
            return buffer
        buffer += chunk
    return buffer                       # <-- a real return *value*

Before we move on, look closely at what read_message does, because the detail is the whole point. A TCP socket is a raw byte stream, not a tidy sequence of messages: one recv might return half of what the client sent, or two messages glued together. So we impose our own framing (here, the dead-simple rule that a message ends at the literal marker END) and loop, appending chunks to buffer until that marker appears. The consequence that matters: a single call to read_message may suspend many times. If the client's message dribbles in across five packets, read_message yields ("read", sock) five separate times, parking and resuming on each one, before it ever reaches a return.

That is what makes it a real test of delegation rather than a toy. shout doesn't write a loop, doesn't count packets, doesn't know whether the message took one read or fifty. It writes one line, message = yield from read_message(sock), and yield from patiently drives the sub-task through however many suspensions it needs, forwarding every one up to the Loop, until the sub-task finally returns a complete message.

Look at its type: Task[bytes], which is just Generator[tuple[str, socket.socket], None, bytes] with the return slot filled in. Where echo was a Task[None], this one is a Task[bytes]: it returns something. That is the slot we told you to hold on to, and it's exactly why Task was made generic in its result. Now a higher-level task consumes it:

def shout(sock: socket.socket) -> Task[None]:
    while True:
        message = yield from read_message(sock)   # await the sub-task's result
        if b"EXIT" in message:                     # client asked to hang up
            sock.close()
            return
        yield ("write", sock)
        sock.sendall(message.upper())

That single line message = yield from read_message(sock) does two distinct things, and both are essential:

  1. It forwards suspensions, all of them. Every ("read", sock) that read_message yields, however many that turns out to be, passes straight through shout and up to the Loop, untouched. The loop never even knows a helper is involved; it just sees read requests and parks the task as always. Suspension propagates across the function boundary automatically.
  2. It captures the return value. When read_message finally hits return buffer, that value becomes the value of the yield from expression. message is bound to the bytes the sub-task produced.

(These two behaviours are only the visible tip of yield from. It also wires up the full two-way channel between the delegating generator and the sub-generator: forwarding values pushed in with .send(), propagating exceptions and .throw(), and surfacing return values via StopIteration. For the complete, rigorous treatment, see Luciano Ramalho's Fluent Python, "Using yield from".)

Forward suspensions, return a result. That is awaiting. message = yield from read_message(sock) is, mechanically, the same thing modern Python writes as message = await read_message(sock). The bare yield ("read", sock) is the primitive suspension, awaiting the loop itself, while yield from is how one task awaits another and collects what it returns.

And now you can see why Task was made generic in its result. A task is no longer a thing that only yields and exits; it is a thing that can hand back a value, and that value's type varies from task to task: bytes for read_message, None for echo, a parsed request or an int elsewhere. The yield protocol is fixed (every task speaks ("read"|"write", socket) to the loop), but the result type is a parameter. That asymmetry is exactly type Task[R] = Generator[tuple[str, socket.socket], None, R]: pin R to bytes and you get a task you can yield from to obtain bytes, with the type checker tracking it the whole way. The loop, which never inspects the result, simply works in Task[Any].

Here is the whole thing in one piece: the scheduler, the generic Task, a sub-task that returns a value, and a parent task that awaits it with yield from. Run it, point a telnet localhost 8000 at it, and type something ending in END (send EXITEND to disconnect).

import select
import socket
from collections import deque
from collections.abc import Generator
from typing import Any

# A task yields ("read"|"write", socket) requests and returns a result of type R.
type Task[R] = Generator[tuple[str, socket.socket], None, R]


class Loop:
    def __init__(self) -> None:
        # The loop ignores each task's result, so it stores them as Task[Any].
        self.ready: deque[Task[Any]] = deque()
        self.read_waiting: dict[socket.socket, Task[Any]] = {}
        self.write_waiting: dict[socket.socket, Task[Any]] = {}

    def add_task(self, task: Task[Any]) -> None:
        self.ready.append(task)

    def run(self) -> None:
        while self.ready or self.read_waiting or self.write_waiting:
            if not self.ready:
                readers, writers, _ = select.select(
                    self.read_waiting, self.write_waiting, []
                )
                for sock in readers:
                    self.ready.append(self.read_waiting.pop(sock))
                for sock in writers:
                    self.ready.append(self.write_waiting.pop(sock))

            task = self.ready.popleft()
            try:
                op = next(task)
            except StopIteration:
                continue

            kind, sock = op
            if kind == "read":
                self.read_waiting[sock] = task
            elif kind == "write":
                self.write_waiting[sock] = task


# A sub-task that RETURNS a value: Task[bytes], not Task[None].
def read_message(sock: socket.socket) -> Task[bytes]:
    buffer = b""
    while b"END" not in buffer:
        yield ("read", sock)            # suspend to the loop, exactly as before
        chunk = sock.recv(1024)
        if not chunk:                   # client hung up mid-message
            return buffer
        buffer += chunk
    return buffer                       # a real return *value*


def shout(sock: socket.socket) -> Task[None]:
    while True:
        message = yield from read_message(sock)   # await the sub-task's result
        if b"EXIT" in message:                     # client asked to hang up
            sock.close()
            return
        yield ("write", sock)
        sock.sendall(message.upper())


def server(loop: Loop, host: str, port: int) -> Task[None]:
    listen = socket.socket()
    listen.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    listen.bind((host, port))
    listen.listen()
    listen.setblocking(False)
    while True:
        yield ("read", listen)
        client, _ = listen.accept()
        client.setblocking(False)
        loop.add_task(shout(client))


if __name__ == "__main__":
    loop = Loop()
    loop.add_task(server(loop, "127.0.0.1", 8000))
    print("shout server listening on 127.0.0.1:8000 (Ctrl-C to stop)")
    loop.run()

The Catch Hiding in Plain Sight

First, let yourself be seduced, because everyone was, and the seduction is the story. Read shout aloud: it is just a loop. Read a message, shout it back, repeat. It reads exactly like the blocking, one-thread-per-connection code we spent the first half of this article mourning, top to bottom, imperative, linear, the shape the human brain actually wants to read. And yet underneath that innocent loop sits a fully non-blocking, single-threaded server fielding thousands of connections at once. No callbacks. No visible state machine. No scheduler bookkeeping smeared through your business logic. You wrote what looks like sequential code, and you got industrial-strength concurrency for it.

And that is the quiet bombshell of this whole exercise: what we just built is async/await. Strip away the ceremony and the keywords are our two yields in nicer clothing, the coroutine is our generator, the standard library's event loop is our Loop with timers and epoll bolted on. For years, asyncio coroutines were literally generators driven by yield from. You now understand the engine under every async def you will ever write, and that was the entire point of looking at the roots.

So hold on to how beautiful it all looks right now, because that first impression is the whole trick, and it is the last time this pattern will look so innocent.

In the installments that follow, I am going to show you exactly how this beautiful solution to concurrency turns into a plague on your codebase.