Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Guide

Regex performance

A poorly-shaped regex can take minutes on input a well-shaped one handles in microseconds. Here's why, and what to look for.

The backtracking problem

Most regex engines (JavaScript V8, Python re, Java, PCRE) use a backtracking NFA. When a path through the pattern fails, the engine reverts to the last decision point and tries another path. For most patterns, this is fast — but some pattern shapes have exponentially many paths.

The classic example is (a+)+$ against aaaa...X:

  • Engine matches all the a's, hits X, fails the $ anchor.
  • Backs up one a, tries to start a new a+ group there, fails again.
  • Tries every possible way to partition the a's between the inner and outer +.
  • For n a's followed by a mismatch, that's 2^n attempts.

30 a's + X takes ~1 second. 35 a's + X takes ~30 seconds. 40 a's + X is unusable.

The danger patterns

Anything with nested quantifiers where the inner can also match across the outer's boundary:

(a+)+        nested + on same char
(a*)*        nested *
(.*)*        catch-all nested
(\w+)+
(.+)+
(a|aa)+      overlapping alternatives
(a|a?)+      alternation includes optional
([^x]*)+     negated class then nested

The pattern ^(\w+)*$ on "aaaaaaaaaaaa!" is the same bug. So is ^(a+b?)+$.

How to detect it

The explainer's Benchmark tab has a ReDoS growth test that runs the pattern against increasing input sizes. If time grows linearly with size, you're safe. If it grows super-linearly (exponent > 1.5), you have a problem.

Manual test: try the pattern against your input followed by a character that forces failure. If it hangs for more than a second on 30-50 characters, you have catastrophic backtracking.

How to fix it

Make alternatives mutually exclusive. (a|aa)+ is ambiguous — the engine has to try both. (aa|a)+ isn't better. Rewrite as a+.

Use atomic groups when available. An atomic group (?>...) commits to the first match and refuses to backtrack into it. (?>a+)+$ on "aaaaX" fails fast because the engine can't back out of the inner group. JavaScript got this in ES2024; Python re doesn't have it (but the regex module does).

Use possessive quantifiers when available. a++, a*+, a?+ never give back what they matched. Same idea as atomic groups, more concise. Java, PCRE, .NET, Ruby support these; JavaScript got them in ES2024; Python's built-in re doesn't.

Anchor when possible. ^pattern$ is much faster than unanchored on long inputs, because the engine only tries position 0.

Be specific. [^"]* is slower than the unrolled equivalent because of how greedy/lazy interaction works. For "everything until a quote", "[^"]*" is fast and safe.

Real-world ReDoS

This isn't just academic. Several major outages have been caused by ReDoS in production:

  • Stack Overflow's 2016 outage was a regex on user input that backtracked exponentially.
  • Cloudflare's 2019 global outage came from a regex in a WAF rule.
  • Multiple npm package vulnerabilities (CVE-2020-7593 and many others) are ReDoS in input-validation regexes.

If your regex runs on user-supplied input, run a ReDoS check before deploying.

When you can't change the regex

If you're stuck with a vulnerable pattern in a third-party library, options include: time-limit the match (Node has no built-in but you can use a worker with a timeout), switch the engine to RE2 (Google's linear-time engine — re2-wasm for JS, re2 module for Python), or pre-validate the input length before matching.


← Back to guides