Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging

The 10 mistakes that ruin most regex

Most "the regex doesn't work" bugs are one of these ten things. The fixes are usually small.

1. Unescaped dots

The dot . matches any character (except newline by default). If you actually want a literal dot — in a filename, IP address, version number — escape it.

Wrong:   ^\d+.\d+$          matches "1.2", BUT ALSO "1x2", "1!2"
Right:   ^\d+\.\d+$         literal dot only

2. Missing anchors in validation

Without ^ and $, your regex matches anywhere in the string. For form validation, this is wrong.

Wrong:   /\d{5}/                 "abc12345xyz" passes validation
Right:   /^\d{5}$/                only "12345" passes

Always anchor regex used for validation. Use the explainer at the top of this site if you're not sure whether a pattern is anchored — it tells you explicitly.

3. Catastrophic backtracking

Patterns with overlapping quantifiers can blow up on input that almost matches but fails at the end:

Bad:     (a+)+b      hangs on "aaaaaaaaaaaaaaaaaaaaa"
Good:    a+b         same matches, no exponential blowup

Avoid nested quantifiers on overlapping patterns. If you need them, use atomic groups (?>...) in PCRE/Java, or rethink the structure.

4. Greedy match grabbing too much

By default, all quantifiers are greedy. They grab as much as possible.

Pattern:  <.+>
Input:    <b>bold</b> and <i>italic</i>
Match:    <b>bold</b> and <i>italic</i>     ← the whole thing!

Fix with a lazy quantifier or — better — a negated class:

Lazy:     <.+?>                  each tag individually
Better:   <[^>]+>                 same result, no backtracking

5. Dot doesn't cross newlines

By default, . matches any character except a newline. For multi-line content, this trips people up.

Pattern:  start.*end
Input:    "start\nmiddle\nend"
Match:    nothing (without the dotall flag)

Fix with the dotall flag:

  • JavaScript: /start.*end/s (s flag, since ES2018)
  • Python: re.compile(r"start.*end", re.DOTALL)
  • PCRE: /s modifier

Or use [\s\S] instead of . — that explicitly matches everything including newlines.

6. Forgetting to escape regex metacharacters in literals

If your "search term" comes from user input or a variable, regex metacharacters in it cause bugs:

// User searches for "$5.00"
const re = new RegExp(userInput);    // BROKEN
const re = new RegExp(escape(userInput));   // fix

Most languages have an escape helper. JS doesn't natively but a one-liner works:

function escapeRegex(s) { return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); }

7. \d matches more than 0-9 in Unicode mode

With Unicode mode enabled, \d in some flavors matches digit characters from any script — Arabic, Devanagari, etc. — not just ASCII 0-9.

// JavaScript with /u flag
/\d/u.test("५")   true (Devanagari 5)

If you need ASCII digits specifically, use [0-9] explicitly.

8. Mixing up which slash to escape

In some contexts (JavaScript regex literals, sed, Perl), the forward slash is the delimiter and needs escaping inside the pattern:

JS literal:    /https:\/\/example\.com/
JS RegExp():   new RegExp("https://example\\.com")    // no slash escape needed
Python:        re.compile(r"https://example\.com")    // never escape slash

If you're moving regex between languages, watch the delimiters.

9. Confusing capture groups with non-capture groups

Parentheses do two things: group sub-patterns together (so quantifiers apply to the group) AND capture what was matched. If you only need grouping, use (?:...) — non-capturing.

(abc)+      matches "abcabc", captures "abc" in group 1
(?:abc)+    matches "abcabc", no capture

This matters for performance (capture takes work) and for replacement strings (the numbering changes).

10. Trying to parse complex formats with regex

Regex is regular — it can't handle nested structures. Don't use regex to:

  • Parse HTML or XML (use a DOM parser)
  • Parse JSON (use a JSON parser)
  • Match balanced parentheses or brackets
  • Validate arbitrarily complex grammars

Regex is excellent for tokenizing simple patterns and for find/replace. It's not a parser. When you find yourself writing a regex with 200 characters of nested groups and lookarounds, stop. There's a real parser for what you're doing.

The takeaway

Most regex bugs cluster around these ten things. Before assuming the regex engine is doing something weird, check whether you're:

  1. Anchoring (or not anchoring) when you should
  2. Escaping metacharacters in your literals
  3. Picking the right quantifier mode (greedy/lazy)
  4. Aware of what flags are or aren't enabled

The explainer on this site is designed to catch most of these — it walks token by token and tells you what each piece does. When in doubt, paste your regex in and see what it actually means.


Related guides