10 more common regex mistakes (and how to fix them)
After teaching regex to hundreds of developers, the same mistakes come up again and again. Here are ten of them.
1. Using . when you should use a more specific class
. matches anything except newline. For phone numbers, \d or [0-9] is more specific and catches errors earlier.
2. Forgetting that . doesn't match newlines
If your input might span lines, either use the s/dotall flag or use [\s\S] as a "match anything" alternative.
3. Greedy when you wanted lazy
<.+> on <b>hello</b> matches the whole string, not just <b>. Use <.+?> or <[^>]+>.
4. Anchors that anchor to the wrong place
Without m, ^ means start-of-string. With m, it means start-of-line. Same for $. Pick the right one for your data.
5. Forgetting to escape user input
If you build a regex from user-supplied strings, escape it. new RegExp(userInput) can fail or, worse, do something unexpected. Use your language's regex.escape or equivalent.
6. Putting . inside a character class
Inside [...], . is literal — it matches a period, not "any character". [a.b] matches a, period, or b.
7. Ranges that don't do what you think
[A-z] looks like "any letter" but ASCII has 6 chars ([ \ ] ^ _ \) between Z and a. You want [A-Za-z].
8. Multi-byte / Unicode without flags
In JavaScript without u, . matches half of an emoji. In Python with bytes (not str), \w doesn't match Greek letters. Set the right flag for your data.
9. \d matching more than 0-9
In some flavors with Unicode mode, \d matches all Unicode digits — Arabic, Devanagari, Bengali numerals. If you specifically want 0-9, use [0-9].
10. Capturing group when non-capturing would do
Adding () just for grouping pollutes your matches and renumbers later groups. Use (?:...) when you don't need to extract the group later. Smaller match objects, easier maintenance.
See also: the original 10 mistakes guide.