Regex performance
A poorly-shaped regex can take minutes on input a well-shaped one handles in microseconds. Here's why, and what to look for.
The backtracking problem
Most regex engines (JavaScript V8, Python re, Java, PCRE) use a backtracking NFA. When a path through the pattern fails, the engine reverts to the last decision point and tries another path. For most patterns, this is fast — but some pattern shapes have exponentially many paths.
The classic example is (a+)+$ against aaaa...X:
- Engine matches all the a's, hits X, fails the $ anchor.
- Backs up one a, tries to start a new
a+group there, fails again. - Tries every possible way to partition the a's between the inner and outer +.
- For n a's followed by a mismatch, that's 2^n attempts.
30 a's + X takes ~1 second. 35 a's + X takes ~30 seconds. 40 a's + X is unusable.
The danger patterns
Anything with nested quantifiers where the inner can also match across the outer's boundary:
(a+)+ nested + on same char
(a*)* nested *
(.*)* catch-all nested
(\w+)+
(.+)+
(a|aa)+ overlapping alternatives
(a|a?)+ alternation includes optional
([^x]*)+ negated class then nested
The pattern ^(\w+)*$ on "aaaaaaaaaaaa!" is the same bug. So is ^(a+b?)+$.
How to detect it
The explainer's Benchmark tab has a ReDoS growth test that runs the pattern against increasing input sizes. If time grows linearly with size, you're safe. If it grows super-linearly (exponent > 1.5), you have a problem.
Manual test: try the pattern against your input followed by a character that forces failure. If it hangs for more than a second on 30-50 characters, you have catastrophic backtracking.
How to fix it
Make alternatives mutually exclusive. (a|aa)+ is ambiguous — the engine has to try both. (aa|a)+ isn't better. Rewrite as a+.
Use atomic groups when available. An atomic group (?>...) commits to the first match and refuses to backtrack into it. (?>a+)+$ on "aaaaX" fails fast because the engine can't back out of the inner group. JavaScript got this in ES2024; Python re doesn't have it (but the regex module does).
Use possessive quantifiers when available. a++, a*+, a?+ never give back what they matched. Same idea as atomic groups, more concise. Java, PCRE, .NET, Ruby support these; JavaScript got them in ES2024; Python's built-in re doesn't.
Anchor when possible. ^pattern$ is much faster than unanchored on long inputs, because the engine only tries position 0.
Be specific. [^"]* is slower than the unrolled equivalent because of how greedy/lazy interaction works. For "everything until a quote", "[^"]*" is fast and safe.
Real-world ReDoS
This isn't just academic. Several major outages have been caused by ReDoS in production:
- Stack Overflow's 2016 outage was a regex on user input that backtracked exponentially.
- Cloudflare's 2019 global outage came from a regex in a WAF rule.
- Multiple npm package vulnerabilities (CVE-2020-7593 and many others) are ReDoS in input-validation regexes.
If your regex runs on user-supplied input, run a ReDoS check before deploying.
When you can't change the regex
If you're stuck with a vulnerable pattern in a third-party library, options include: time-limit the match (Node has no built-in but you can use a worker with a timeout), switch the engine to RE2 (Google's linear-time engine — re2-wasm for JS, re2 module for Python), or pre-validate the input length before matching.