Groups and capture groups
Parentheses are how regex captures parts of a match. Master groups and you unlock most of regex's real power.
What groups do
Wrap any part of a regex in parentheses and you've made a capture group. Two things happen: the matched text is remembered (you can reference it by number later), and you can apply quantifiers to the whole group instead of just one character.
The pattern (\d{{4}})-(\d{{2}})-(\d{{2}}) matched against 2024-06-15 captures three pieces: 2024 in group 1, 06 in group 2, 15 in group 3. The whole match (group 0) is the entire 2024-06-15.
Non-capturing groups
Sometimes you want grouping without capture — say, to apply a quantifier without polluting the capture list. Use (?:...):
(?:https?:\/\/)?www\.example\.com
The (?:https?:\/\/)? makes the protocol optional without storing it as group 1. The first real capture group, if any, becomes group 1.
Reasons to prefer non-capturing groups: cleaner output, slightly faster matching, doesn't shift the numbering of later capture groups.
Named groups
Capture groups numbered 1, 2, 3 are fine for two or three captures. Beyond that, you start losing track of what each one is. Named groups solve this.
(?<year>\d{{4}})-(?<month>\d{{2}})-(?<day>\d{{2}})
Now the match exposes m.groups.year, m.groups.month, m.groups.day in JavaScript, or m.group('year') in Python. Much harder to forget what each field means.
Back-references inside the pattern
Sometimes you need to match the same thing twice. Back-references let you say "match whatever group 1 matched, again." Use \1, \2, etc. for numbered groups, or \k<name> for named.
Find duplicate words:
\b(\w+)\s+\1\b
Match HTML tags where opener and closer are the same:
<(\w+)>.*?<\/\1>
Back-references make regex Turing-equivalent-ish — be careful, they can cause catastrophic backtracking on adversarial input.
Back-references in replacement strings
Replacements (substitutions) use a different syntax than the pattern. In JavaScript and PCRE, use $1, $2 in the replacement; in Python's re.sub, use \1, \2; in named-group replacements use $<name> (JS) or \g<name> (Python).
Reformat YYYY-MM-DD → MM/DD/YYYY:
"2024-06-15".replace(/(\d{{4}})-(\d{{2}})-(\d{{2}})/, "$2/$3/$1")
// "06/15/2024"
Lookarounds aren't really groups
Lookahead (?=...) and lookbehind (?<=...) look like groups but don't capture — they're zero-width assertions that check what's around the current position without consuming characters. See the lookarounds guide for details.
Common gotchas
- Adding parens shifts group numbers. If you have
(foo)(bar)and add((foo))(bar), what was group 1 is now group 2. Named groups avoid this. - Optional groups can be undefined. If a group doesn't participate in the match, accessing it returns
undefinedin JS,Nonein Python. Always check. - Repeated groups only keep the last match.
(\d)+on "123" leaves group 1 = "3", not "1,2,3". To get all matches, use the/gflag and iterate.
Try it
Paste a regex with groups into the explainer and see them broken out in the tree view. The Match details table shows each group's value for every match.