Regex fundamentals — from zero to confident
A regular expression is a tiny programming language for describing patterns in text. By the end of this guide, you'll be able to read most regexes you encounter and write your own.
What a regex is, and what it isn't
A regex is a string that describes which other strings match a pattern. The regex cat matches any text containing the three letters c-a-t in order. The regex \d+ matches any text containing one or more digits.
A regex is not a programming language for parsing things. If you find yourself nesting alternations and lookarounds to validate HTML or JSON, you're using the wrong tool. Regex is for matching text shapes, not for understanding structure.
The two kinds of characters
Every character in a regex is either a literal — matches itself — or a metacharacter — has a special meaning. The twelve metacharacters are . \ ^ $ * + ? ( ) [ ] { } |. To match one literally, escape it with a backslash: \. matches a literal period.
Character classes
Square brackets define a class — any one of the characters inside matches. [abc] matches a, b, or c. [a-z] matches any lowercase letter. [^abc] matches any character except a, b, or c (negation).
Shorthand classes you'll use constantly:
\d— any digit (0–9)\w— any word character (letter, digit, or underscore)\s— any whitespace (space, tab, newline).— any character (except newline, unless the s flag is set)
Quantifiers
Quantifiers say how many times the preceding thing should appear:
*— zero or more+— one or more?— zero or one (optional){3}— exactly three{2,5}— between two and five{3,}— three or more
By default, quantifiers are greedy: they match as much as possible. Append ? to make them lazy: match as little as possible. See our quantifiers guide for the deep dive.
Anchors
Anchors don't match characters — they match positions:
^— start of string (or start of line with /m flag)$— end of string (or end of line with /m flag)\b— word boundary (between a word character and a non-word character)
^cat$ matches the string "cat" exactly. Without anchors, cat would also match "category" or "scatter".
Groups
Parentheses group things together so quantifiers and alternation apply to the whole group: (abc)+ matches one or more "abc" sequences. Parens also capture what they matched so you can refer back to it.
(\d{4})-(\d{2})-(\d{2}) captures the year, month, and day from a date as groups 1, 2, and 3.
Alternation
The pipe character | means "or": cat|dog|fish matches any of those three words. Combine with groups to make alternation local: (cat|dog) lover matches "cat lover" or "dog lover".
Flags
Flags modify how a regex behaves overall:
i— case-insensitiveg— find all matches (not just the first)m— multiline (^ and $ match at line boundaries)s— dotall (. also matches newlines)u— Unicode-aware (JavaScript)
Your first real pattern
Let's build a US ZIP code matcher: 5 digits, optionally followed by hyphen and 4 more digits.
Start with the 5 digits: \d{5}
Add the optional extension: \d{5}(-\d{4})?
Anchor it so we match the whole string: ^\d{5}(-\d{4})?$
Test it in the explainer to see a token-by-token breakdown.
Where to go next
- Anchors in depth — start, end, word boundary
- Quantifiers — greedy vs lazy explained
- Character classes — shorthand and custom
- Lookarounds — assertions without consuming
- Common mistakes — the gotchas to avoid
Or jump straight to the pattern library for 300+ ready-to-use regex patterns.