Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Blog

Find duplicate consecutive words with regex

Repeated words in writing are a common typo: "the the cat" or "to to do". Catch them with a single regex back-reference.

The basic pattern

const DUPE_RE = /\b(\w+)\s+\1\b/gi;

const text = "I went to to the store and saw the the cat";
const matches = [...text.matchAll(DUPE_RE)];
matches.forEach(m => console.log("Found:", m[0]));
// "to to"
// "the the"

The key piece is \1 — a back-reference to whatever group 1 captured. So we capture a word, then require the same word to appear right after.

Why \b matters

Without word boundaries, (\w+)\s+\1 on "tomato matoo" would match because group 1 = "mato" appears twice. Word boundaries force whole-word matches.

Case-insensitive duplicates

"The the cat" should be caught even if the cases differ. The i flag does case-insensitive matching, including for back-references.

Highlighting duplicates in text

const flagged = text.replace(
  /\b(\w+)(\s+)(\1)\b/gi,
  "<mark>$1$2$3</mark>"
);

Three capture groups: word, whitespace, repeated word. The replacement wraps the whole thing in <mark> for HTML rendering.

Python version

import re

DUPE_RE = re.compile(r"\b(\w+)\s+\1\b", re.IGNORECASE)

text = "I went to to the store"
for m in DUPE_RE.finditer(text):
    print(f"Duplicate '{m.group(1)}' at position {m.start()}")

Caveat: doesn't catch all duplicates

This regex catches consecutive duplicates only. "The cat sat on the cat" has duplicate "the" but they aren't adjacent. For that, you need either a more complex pattern or a word-frequency count in code.


← Back to blog