Comprehensions
List, dict, set, and generator comprehensions: concise collection building, generator laziness, and the walrus operator for filter-and-bind.
A comprehension builds a collection from an iterable in a single expression. Python has four forms: list ([...]), dict ({k: v ...}), set ({v ...}), and generator ((...)). They replace many for-loop-then-append patterns and express the intent more directly.
List, dict, and set comprehensions all follow the same shape: [expression for item in iterable if condition]. The if part is optional. Each form produces a different collection type.
xs = [1, 2, 3, 4, 5]
d = {"a": 1, "b": 2, "c": 3}
# List comprehension
doubled = [x * 2 for x in xs]
doubled # [2, 4, 6, 8, 10]
# With a filter
evens = [x for x in xs if x % 2 == 0]
evens # [2, 4]
# Dict comprehension
scaled = {k: v * 2 for k, v in d.items()}
scaled # {"a": 2, "b": 4, "c": 6}
# Set comprehension (duplicates removed automatically)
last_digits = {x % 10 for x in [11, 21, 31, 22]}
last_digits # {1, 2}A generator expression looks like a list comprehension but uses parentheses instead of square brackets. It is lazy: values are produced one at a time as the consumer asks for them, so the whole sequence is never held in memory at once. This makes generator expressions ideal when feeding functions that iterate once, like sum(), any(), all(), or max().
xs = range(1_000_000)
# List comprehension: builds the entire list in memory first
total_list = sum([x * 2 for x in xs])
# Generator expression: produces values lazily, uses constant memory
total_gen = sum(x * 2 for x in xs)
# Both give the same result, but the generator uses far less memory.
# When passing a generator expression as the sole argument to a
# function, you can drop the extra parentheses:
total_gen = sum(x * 2 for x in xs) # not sum((x * 2 for x in xs))
any(x > 999_990 for x in xs) # True (stops early on first match)
all(x >= 0 for x in xs) # TrueThe walrus operator (:=, added in Python 3.8) assigns a value inside an expression. In a comprehension it lets you call an expensive function once and both filter and use the result in a single pass - no need for a temporary list or a two-step filter-then-transform.
import re
logs = [
"ERROR: disk full",
"INFO: started",
"ERROR: timeout",
"DEBUG: ping",
]
# Without walrus: re.search called twice per item
errors = [line for line in logs if re.search(r"ERROR: (.+)", line)]
# With walrus: call once, bind to `m`, filter on it, use it
messages = [
m.group(1)
for line in logs
if (m := re.search(r"ERROR: (.+)", line)) is not None
]
messages # ["disk full", "timeout"]In production
A generator expression (x * 2 for x in xs) is lazy and constant-memory - reach for it over a list comprehension when feeding sum(), any(), all(), or any consumer that iterates once. Two levels of nesting in a comprehension are readable; three are not - extract the inner level to a named function or a plain for loop. The walrus operator (:=) filters and binds in one pass, which is its clearest use case; overusing it in complex expressions hurts readability more than it helps.
Enjoyed this? Get more essays on software craft delivered to your inbox.
Subscribe free