What does \b mean in regex?
\b matches a position, not a character — it's the difference between matching cat and matching cat inside catalog.
The short answer
\b is a word boundary. It matches the position between a word character (\w: letter, digit, or underscore) and a non-word character — or the start/end of the string if it's next to a word character.
It doesn't consume any character. It just asserts that the cursor is at a word boundary at that moment.
The classic use case
You want to find the word "cat" — not "catalog" or "cat-like" or "concat":
Pattern: \bcat\b
Input: "The cat sat on the catalog"
Matches: "cat" (just the standalone one)
The first \b asserts "I'm at the start of a word" before the c. The second \b asserts "I'm at the end of a word" after the t. Inside "catalog", the position after t is between two word characters (t and a) — not a boundary — so the match fails.
Where boundaries actually are
Look at the string hello, world!. The word boundaries are at the positions marked with |:
|h e l l o|, |w o r l d|!
↑ ↑ ↑ ↑
boundary boundary
positions
Five letters, then a comma — boundary between o and ,. Then a space, then w — boundary between space and w. Each transition between a word char and a non-word char is a boundary.
The Unicode gotcha
By default, \w only matches ASCII [A-Za-z0-9_]. So \b doesn't understand accented characters:
Pattern: \bcafé\b
Input: "I love café food"
Match: FAILS (with default flags)
The é is NOT a word character in ASCII mode. The position after caf is a word boundary (because we transition from word f to non-word é), so the regex tries to match starting somewhere it shouldn't.
Fix this by enabling Unicode-aware matching:
- JavaScript: add the
uflag →/\bcafé\b/u - Python 3: Unicode is default — works automatically
- PCRE: use
/uor the(*UCP)directive
The opposite: \B
\B matches a non-boundary — a position that is NOT between a word char and a non-word char. Useful for finding substrings inside words:
Pattern: \Bcat\B
Input: "The catalog is a cat"
Match: "cat" inside "catalog" (the standalone "cat" is excluded)
Common mistakes
Using \b around non-word characters
\b. at the start of a string is fine — \b matches the start before a word character. But \b! doesn't mean what you might think:
Pattern: \b!
Input: "hello!"
Match: YES (boundary between word "o" and non-word "!")
The boundary is between o and !, and \b matches that position. The ! then consumes the exclamation. That's probably what you want, but be aware: \b is about transitions, not about a specific side.
Expecting \b to work without word characters nearby
\b\b matches a single boundary position twice — it doesn't mean "two boundaries." Boundaries don't consume characters, so two in a row match the same position.
Practical patterns
Match a whole word only
\bword\b
Find numbers but not inside identifiers
Pattern: \b\d+\b
Input: "count is 42 but x32 is also there"
Matches: ["42"] (not "32" inside "x32")
Match the start of a word
\b[A-Z] first letter, uppercase
The takeaway
\b is a position, not a character. It's the difference between matching a word and matching a word-shaped substring. Most "regex matched too much" bugs in word-search patterns are fixed by adding \b on both sides.
For Unicode text, make sure your flavor is in Unicode mode — otherwise \b won't see accented letters as word characters and your matches will be wrong.
Related reading
Try this pattern in the explainer
Paste any regex into the live explainer and see what each token means, with example matches in real time.
Open the regex explainer →