Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Guide

Groups and capture groups

Parentheses are how regex captures parts of a match. Master groups and you unlock most of regex's real power.

What groups do

Wrap any part of a regex in parentheses and you've made a capture group. Two things happen: the matched text is remembered (you can reference it by number later), and you can apply quantifiers to the whole group instead of just one character.

The pattern (\d{{4}})-(\d{{2}})-(\d{{2}}) matched against 2024-06-15 captures three pieces: 2024 in group 1, 06 in group 2, 15 in group 3. The whole match (group 0) is the entire 2024-06-15.

Non-capturing groups

Sometimes you want grouping without capture — say, to apply a quantifier without polluting the capture list. Use (?:...):

(?:https?:\/\/)?www\.example\.com

The (?:https?:\/\/)? makes the protocol optional without storing it as group 1. The first real capture group, if any, becomes group 1.

Reasons to prefer non-capturing groups: cleaner output, slightly faster matching, doesn't shift the numbering of later capture groups.

Named groups

Capture groups numbered 1, 2, 3 are fine for two or three captures. Beyond that, you start losing track of what each one is. Named groups solve this.

(?<year>\d{{4}})-(?<month>\d{{2}})-(?<day>\d{{2}})

Now the match exposes m.groups.year, m.groups.month, m.groups.day in JavaScript, or m.group('year') in Python. Much harder to forget what each field means.

Back-references inside the pattern

Sometimes you need to match the same thing twice. Back-references let you say "match whatever group 1 matched, again." Use \1, \2, etc. for numbered groups, or \k<name> for named.

Find duplicate words:

\b(\w+)\s+\1\b

Match HTML tags where opener and closer are the same:

<(\w+)>.*?<\/\1>

Back-references make regex Turing-equivalent-ish — be careful, they can cause catastrophic backtracking on adversarial input.

Back-references in replacement strings

Replacements (substitutions) use a different syntax than the pattern. In JavaScript and PCRE, use $1, $2 in the replacement; in Python's re.sub, use \1, \2; in named-group replacements use $<name> (JS) or \g<name> (Python).

Reformat YYYY-MM-DD → MM/DD/YYYY:

"2024-06-15".replace(/(\d{{4}})-(\d{{2}})-(\d{{2}})/, "$2/$3/$1")
// "06/15/2024"

Lookarounds aren't really groups

Lookahead (?=...) and lookbehind (?<=...) look like groups but don't capture — they're zero-width assertions that check what's around the current position without consuming characters. See the lookarounds guide for details.

Common gotchas

  • Adding parens shifts group numbers. If you have (foo)(bar) and add ((foo))(bar), what was group 1 is now group 2. Named groups avoid this.
  • Optional groups can be undefined. If a group doesn't participate in the match, accessing it returns undefined in JS, None in Python. Always check.
  • Repeated groups only keep the last match. (\d)+ on "123" leaves group 1 = "3", not "1,2,3". To get all matches, use the /g flag and iterate.

Try it

Paste a regex with groups into the explainer and see them broken out in the tree view. The Match details table shows each group's value for every match.


← Back to guides