Capture groups — every kind explained
Parentheses do four jobs in regex: grouping for quantifiers, capturing for back-references, naming, and assertions. This guide covers them all with concrete examples.
1. Capturing groups
The simplest form: (pattern). The parens both group the contents for quantifiers and capture the match for back-references.
// "2024-06-15"
const m = "2024-06-15".match(/(\d{4})-(\d{2})-(\d{2})/);
m[0]; // "2024-06-15" — full match
m[1]; // "2024" — group 1
m[2]; // "06"
m[3]; // "15"
Groups are numbered by the position of their opening paren, left to right. Nested groups share the same numbering:
// ((a)(b)) — three groups: "ab", "a", "b"
"ab".match(/((a)(b))/); // ["ab", "ab", "a", "b"]
2. Non-capturing groups
Sometimes you want to group for quantifiers or alternation, but you don't need to capture the match. Use (?:pattern).
// Match one or more repetitions of "ab"
"ababab".match(/(?:ab)+/); // ["ababab"] — only one group: the full match
Why bother? Three reasons:
- Performance. Capturing has overhead. For groups you don't need to reference, non-capturing is faster.
- Readability. When the match has many groups, non-capturing for grouping-only operations keeps the back-reference numbers stable.
- Numbering. Adding a new non-capturing group doesn't renumber the other groups.
3. Named capture groups
Capture by name instead of number. Far more readable for complex patterns.
// JavaScript / .NET / Java
(?<year>\d{4})-(?<month>\d{2})
// Python (with the P)
(?P<year>\d{4})-(?P<month>\d{2})
Access by name:
// JavaScript
const m = "2024-06".match(/(?<year>\d{4})-(?<month>\d{2})/);
m.groups.year; // "2024"
m.groups.month; // "06"
# Python
m = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})", "2024-06")
m.group("year") # "2024"
m.groupdict() # {"year": "2024", "month": "06"}
Named groups are still numbered too — group 1 in the example above is "2024", same as if it were unnamed.
4. Back-references inside the pattern
Refer to a previously captured group from within the same pattern:
// Match repeated words: "hello hello", "the the"
\b(\w+)\s+\1\b
// Match an HTML tag and its closing tag
<(\w+)>.*?</\1>
Back-references match the literal text of the earlier capture, not the pattern. So (\w+)\s\1 matches "ab ab" but not "ab cd" even though both groups would individually match \w+.
5. Named back-references
Same idea but by name:
// JavaScript: \k<name>
(?<word>\w+)\s+\k<word>
# Python: (?P=name)
(?P<word>\w+)\s+(?P=word)
6. Atomic groups (advanced)
An atomic group (?>pattern) commits to whatever the inner pattern matched — no backtracking inside. This prevents catastrophic backtracking in cases where regular groups would retry.
// Vulnerable to ReDoS
(\w+)+$
// Safe equivalent
(?>\w+)+$ // PCRE / Java / Ruby / .NET
\w++$ // possessive quantifier — same idea
JavaScript supports atomic groups in ES2025 engines and later. Not available in Python's standard re module (use the third-party regex module).
Lookaround groups (assertions)
Lookarounds use parens but don't actually capture — they assert that a position is followed/preceded by a pattern.
// Positive lookahead — followed by
\d+(?=USD)
// Negative lookahead — NOT followed by
\d+(?!USD)
// Positive lookbehind — preceded by
(?<=\$)\d+
// Negative lookbehind — NOT preceded by
(?<!\$)\d+
See our lookarounds guide for the deep dive.
Practical examples
Reformat a name
// "Last, First" → "First Last"
"Smith, Alice".replace(/(\w+), (\w+)/, "$2 $1")
// "Alice Smith"
Extract key-value pairs
// "key1=value1&key2=value2"
const params = {};
"name=alice&age=30".replace(/(\w+)=(\w+)/g, (_, k, v) => {
params[k] = v;
});
// { name: "alice", age: "30" }
Find duplicate adjacent words
"the the cat sat on on the mat".match(/\b(\w+)\s+\1\b/g);
// ["the the", "on on"]
Performance note
Each capturing group has some overhead in backtracking engines. For hot loops on large inputs, prefer non-capturing groups when you don't need the values.