Regex backreferences — match the same text twice
Reach back to a captured group and require it to appear again. The trick for finding repeated content.
The short answer
A backreference is a regex construct that matches the same text a previous capture group matched. Syntax:
\1, \2, \3 numbered backreference (most flavors)
\k<name> named backreference (JavaScript, PCRE)
(?P=name) named backreference (Python)
The classic example: doubled words
Find words that appear twice in a row:
Pattern: \b(\w+)\s+\1\b
Input: "the the cat sat sat there"
Matches: "the the" and "sat sat"
Step by step:
\b(\w+)— match and capture a word\s+— one or more whitespace\1— match the SAME word again (whatever group 1 captured)\b— at a word boundary
What backreferences are good for
1. Detecting duplicates
Repeated characters:
(.)\1 any character that's immediately repeated
Repeated words (as above):
\b(\w+)\s+\1\b
2. Matching paired delimiters
Make sure quotes match:
(["']).*?\1 either "..." or '...' — but not "..."mismatched
Without the backreference, you'd match "abc' as a "string" — backreference forces both quotes to be the same character.
3. HTML/XML tag matching (within limits)
<(\w+)>.*?</\1> tag balance check, single-level
Backreferences let you require the closing tag to match the opening one. This isn't a full HTML parser (regex can't handle nesting), but it's useful for simple cases.
Named backreferences
Numbered groups (\1, \2) are fragile — if you reorder the pattern, the numbers shift. Named backreferences are self-documenting and stable:
JavaScript / PCRE
(?<word>\w+)\s+\k<word>
Python
(?P<word>\w+)\s+(?P=word)
Same effect, but clear at a glance. Prefer named over numbered in production code.
Backreferences in replacements
Backreferences also work in the replacement string of a find-and-replace. Syntax varies:
JavaScript: $1, $2, $<name>
Python: \1, \g<name> (or use re.sub)
sed: \1
Perl: $1
Example: swap first and last name from "Last, First":
JS: "Smith, John".replace(/(\w+), (\w+)/, "$2 $1")
// "John Smith"
Py: re.sub(r"(\w+), (\w+)", r"\2 \1", "Smith, John")
# "John Smith"
Performance considerations
Backreferences make a regex non-regular — it can no longer be represented as a finite automaton. This means:
- Engines using NFA-with-backtracking (JS, Python re, PCRE, Perl) support them but with worst-case exponential time.
- Engines using a strict regular grammar (Go regexp, Rust regex, RE2) DON'T support backreferences at all.
If your engine doesn't support backreferences and you need duplicate-detection, do it in two passes: first find all matches, then check for repetition in code.
The takeaway
Backreferences let you reuse previously matched content within the same regex. They're the right tool for duplicate detection, paired-delimiter matching, and template substitution.
Use named backreferences (\k<name> or (?P=name)) when you can — they're clearer and don't break when you reorganize the pattern.
Related reading
Try this pattern in the explainer
Paste any regex into the live explainer and see what each token means, with example matches in real time.
Open the regex explainer →