Blog

10 more common regex mistakes (and how to fix them)

After teaching regex to hundreds of developers, the same mistakes come up again and again. Here are ten of them.

1. Using `.` when you should use a more specific class

. matches anything except newline. For phone numbers, \d or [0-9] is more specific and catches errors earlier.

2. Forgetting that `.` doesn't match newlines

If your input might span lines, either use the s/dotall flag or use [\s\S] as a "match anything" alternative.

3. Greedy when you wanted lazy

<.+> on <b>hello</b> matches the whole string, not just <b>. Use <.+?> or <[^>]+>.

4. Anchors that anchor to the wrong place

Without m, ^ means start-of-string. With m, it means start-of-line. Same for $. Pick the right one for your data.

5. Forgetting to escape user input

If you build a regex from user-supplied strings, escape it. new RegExp(userInput) can fail or, worse, do something unexpected. Use your language's regex.escape or equivalent.

6. Putting `.` inside a character class

Inside [...], . is literal — it matches a period, not "any character". [a.b] matches a, period, or b.

7. Ranges that don't do what you think

[A-z] looks like "any letter" but ASCII has 6 chars ([ \ ] ^ _ \) between Z and a. You want [A-Za-z].



8. Multi-byte / Unicode without flags
In JavaScript without u, . matches half of an emoji. In Python with bytes (not str), \w doesn't match Greek letters. Set the right flag for your data.

9. \d matching more than 0-9
In some flavors with Unicode mode, \d matches all Unicode digits — Arabic, Devanagari, Bengali numerals. If you specifically want 0-9, use [0-9].

10. Capturing group when non-capturing would do
Adding () just for grouping pollutes your matches and renumbers later groups. Use (?:...) when you don't need to extract the group later. Smaller match objects, easier maintenance.

See also: the original 10 mistakes guide.


← Back to blog

10 more common regex mistakes (and how to fix them)

1. Using . when you should use a more specific class

2. Forgetting that . doesn't match newlines

3. Greedy when you wanted lazy

4. Anchors that anchor to the wrong place

5. Forgetting to escape user input

6. Putting . inside a character class

7. Ranges that don't do what you think

8. Multi-byte / Unicode without flags

9. \d matching more than 0-9

10. Capturing group when non-capturing would do

1. Using `.` when you should use a more specific class

2. Forgetting that `.` doesn't match newlines

6. Putting `.` inside a character class

9. `\d` matching more than 0-9