The 10 mistakes that ruin most regex
Most "the regex doesn't work" bugs are one of these ten things. The fixes are usually small.
1. Unescaped dots
The dot . matches any character (except newline by default). If you actually want a literal dot — in a filename, IP address, version number — escape it.
Wrong: ^\d+.\d+$ matches "1.2", BUT ALSO "1x2", "1!2"
Right: ^\d+\.\d+$ literal dot only
2. Missing anchors in validation
Without ^ and $, your regex matches anywhere in the string. For form validation, this is wrong.
Wrong: /\d{5}/ "abc12345xyz" passes validation
Right: /^\d{5}$/ only "12345" passes
Always anchor regex used for validation. Use the explainer at the top of this site if you're not sure whether a pattern is anchored — it tells you explicitly.
3. Catastrophic backtracking
Patterns with overlapping quantifiers can blow up on input that almost matches but fails at the end:
Bad: (a+)+b hangs on "aaaaaaaaaaaaaaaaaaaaa"
Good: a+b same matches, no exponential blowup
Avoid nested quantifiers on overlapping patterns. If you need them, use atomic groups (?>...) in PCRE/Java, or rethink the structure.
4. Greedy match grabbing too much
By default, all quantifiers are greedy. They grab as much as possible.
Pattern: <.+>
Input: <b>bold</b> and <i>italic</i>
Match: <b>bold</b> and <i>italic</i> ← the whole thing!
Fix with a lazy quantifier or — better — a negated class:
Lazy: <.+?> each tag individually
Better: <[^>]+> same result, no backtracking
5. Dot doesn't cross newlines
By default, . matches any character except a newline. For multi-line content, this trips people up.
Pattern: start.*end
Input: "start\nmiddle\nend"
Match: nothing (without the dotall flag)
Fix with the dotall flag:
- JavaScript:
/start.*end/s(sflag, since ES2018) - Python:
re.compile(r"start.*end", re.DOTALL) - PCRE:
/smodifier
Or use [\s\S] instead of . — that explicitly matches everything including newlines.
6. Forgetting to escape regex metacharacters in literals
If your "search term" comes from user input or a variable, regex metacharacters in it cause bugs:
// User searches for "$5.00"
const re = new RegExp(userInput); // BROKEN
const re = new RegExp(escape(userInput)); // fix
Most languages have an escape helper. JS doesn't natively but a one-liner works:
function escapeRegex(s) { return s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); }
7. \d matches more than 0-9 in Unicode mode
With Unicode mode enabled, \d in some flavors matches digit characters from any script — Arabic, Devanagari, etc. — not just ASCII 0-9.
// JavaScript with /u flag
/\d/u.test("५") true (Devanagari 5)
If you need ASCII digits specifically, use [0-9] explicitly.
8. Mixing up which slash to escape
In some contexts (JavaScript regex literals, sed, Perl), the forward slash is the delimiter and needs escaping inside the pattern:
JS literal: /https:\/\/example\.com/
JS RegExp(): new RegExp("https://example\\.com") // no slash escape needed
Python: re.compile(r"https://example\.com") // never escape slash
If you're moving regex between languages, watch the delimiters.
9. Confusing capture groups with non-capture groups
Parentheses do two things: group sub-patterns together (so quantifiers apply to the group) AND capture what was matched. If you only need grouping, use (?:...) — non-capturing.
(abc)+ matches "abcabc", captures "abc" in group 1
(?:abc)+ matches "abcabc", no capture
This matters for performance (capture takes work) and for replacement strings (the numbering changes).
10. Trying to parse complex formats with regex
Regex is regular — it can't handle nested structures. Don't use regex to:
- Parse HTML or XML (use a DOM parser)
- Parse JSON (use a JSON parser)
- Match balanced parentheses or brackets
- Validate arbitrarily complex grammars
Regex is excellent for tokenizing simple patterns and for find/replace. It's not a parser. When you find yourself writing a regex with 200 characters of nested groups and lookarounds, stop. There's a real parser for what you're doing.
The takeaway
Most regex bugs cluster around these ten things. Before assuming the regex engine is doing something weird, check whether you're:
- Anchoring (or not anchoring) when you should
- Escaping metacharacters in your literals
- Picking the right quantifier mode (greedy/lazy)
- Aware of what flags are or aren't enabled
The explainer on this site is designed to catch most of these — it walks token by token and tells you what each piece does. When in doubt, paste your regex in and see what it actually means.