Greedy vs lazy vs possessive — the three flavors of "more than one"
Every quantifier has three modes. Most people only know one. The other two solve real problems.
The basic quantifiers
Five symbols control repetition in regex:
?— zero or one (optional)*— zero or more+— one or more{n}— exactly n{n,m}— between n and m (m is optional, defaults to infinity)
That's the basics. But each of these has three modes — greedy, lazy, and possessive — and they behave very differently.
Greedy: the default
By default, every quantifier is greedy. It matches as much as possible, then backs off only if the rest of the pattern fails.
Example. Apply ".*" to the string "foo" and "bar":
Input: "foo" and "bar"
Pattern: ".*"
Result: "foo" and "bar" ← matches the WHOLE thing
Why? The .* is greedy. It first consumes everything after the opening quote, all the way to the end of the string. Then the regex engine tries to match the closing " — fails. So .* backs off one character and tries again. Still no closing " at that position. Keeps backing off until it finally lands on the last " in "bar". Match succeeds.
So ".*" matched too much. That's greed.
Lazy: minimum match
Add a ? after any quantifier to make it lazy. Lazy means "match as little as possible":
Input: "foo" and "bar"
Pattern: ".*?"
Result: "foo" ← stops at the first closing quote
Lazy quantifiers start with zero matches and grow only as needed. They're the right tool when you want the smallest possible match — typically when parsing delimited content.
The lazy versions of each quantifier:
??— lazy zero-or-one*?— lazy zero-or-more+?— lazy one-or-more{n,m}?— lazy bounded
The catch: lazy isn't always what you want
People reach for lazy when they have a greedy-matching-too-much problem. Sometimes that's the right fix. Other times, the real fix is to be more specific about what you're matching.
Compare these two ways to extract content between quotes:
Pattern A: ".*?" lazy, generally works
Pattern B: "[^"]*" match anything except a quote
Pattern B is usually better. It's clearer about intent, it doesn't require backtracking, and it's typically faster. The general rule: negated character classes beat lazy quantifiers whenever possible.
Possessive: never backtrack
Possessive quantifiers, written with a trailing +, match like greedy ones but refuse to give back what they matched. If the rest of the pattern fails, the regex fails outright — no backtracking.
Greedy: \d+5 matches "12345" (5 digits, backs off to find the 5)
Possessive: \d++5 fails on "12345" (digits consume all 5, won't give one back)
Why would you want this? Two reasons:
- Performance. Backtracking is expensive. Possessive quantifiers skip it entirely.
- Preventing ReDoS. Catastrophic backtracking happens when the engine tries an exponential number of paths. Possessive quantifiers prevent it.
Possessive versions:
?+— possessive optional*+— possessive zero-or-more++— possessive one-or-more{n,m}+— possessive bounded
Flavor support
| Flavor | Greedy | Lazy | Possessive |
|---|---|---|---|
| JavaScript | ✓ | ✓ | ✗ |
| Python (stdlib re) | ✓ | ✓ | 3.11+ |
| PCRE | ✓ | ✓ | ✓ |
| Java | ✓ | ✓ | ✓ |
Backtracking and how regex engines work
Most regex engines (NFA-based, used by JS, Python, PCRE) work via backtracking. When a match fails, the engine tries alternatives — different quantifier amounts, different alternation branches — until one succeeds or all paths are exhausted.
For most patterns this is fine. The problem is patterns where the number of backtracking paths grows exponentially with input length. The classic example:
Pattern: ^(a+)+$
Input: aaaaaaaaaaaaaaaaaaaaX
The nested quantifiers (a+)+ create overlapping ways to assign characters to the inner and outer plus. With 20 a's and a non-matching X at the end, the engine tries about a million combinations before giving up. With 30 a's, a billion.
This is ReDoS (regular expression denial of service). Untrusted input + a vulnerable regex = your server hangs.
Patterns to watch for
(a+)+,(a*)*— quantifier on a group that can match overlapping content(a|a)+— alternation with overlapping branches, then a quantifier(a|aa)+— same idea, multiple ways to consume the same characters
How to fix
- Use atomic groups
(?>...)or possessive quantifiers to prevent the inner backtracking. PCRE and Java only. - Use a negated class or unambiguous alternation.
([^a]+)+isn't pathological because the inner group can't overlap with itself. - Use a non-backtracking engine. Go's regexp (RE2), Rust's regex crate, and Hyperscan all use linear-time algorithms by avoiding the features that cause backtracking.
The takeaway
Greedy is the default and usually fine. Lazy is the right answer when you want minimum, but a negated character class is often clearer. Possessive eliminates backtracking entirely — use it for performance and to prevent ReDoS in PCRE/Java code.
Most regex bugs come from greedy matching more than expected. The fix is usually not reaching for lazy reflexively — it's thinking about what you're actually trying to match and making that explicit.