Find duplicate consecutive words with regex
Repeated words in writing are a common typo: "the the cat" or "to to do". Catch them with a single regex back-reference.
The basic pattern
const DUPE_RE = /\b(\w+)\s+\1\b/gi;
const text = "I went to to the store and saw the the cat";
const matches = [...text.matchAll(DUPE_RE)];
matches.forEach(m => console.log("Found:", m[0]));
// "to to"
// "the the"
The key piece is \1 — a back-reference to whatever group 1 captured. So we capture a word, then require the same word to appear right after.
Why \b matters
Without word boundaries, (\w+)\s+\1 on "tomato matoo" would match because group 1 = "mato" appears twice. Word boundaries force whole-word matches.
Case-insensitive duplicates
"The the cat" should be caught even if the cases differ. The i flag does case-insensitive matching, including for back-references.
Highlighting duplicates in text
const flagged = text.replace(
/\b(\w+)(\s+)(\1)\b/gi,
"<mark>$1$2$3</mark>"
);
Three capture groups: word, whitespace, repeated word. The replacement wraps the whole thing in <mark> for HTML rendering.
Python version
import re
DUPE_RE = re.compile(r"\b(\w+)\s+\1\b", re.IGNORECASE)
text = "I went to to the store"
for m in DUPE_RE.finditer(text):
print(f"Duplicate '{m.group(1)}' at position {m.start()}")
Caveat: doesn't catch all duplicates
This regex catches consecutive duplicates only. "The cat sat on the cat" has duplicate "the" but they aren't adjacent. For that, you need either a more complex pattern or a word-frequency count in code.