Concepts May 10, 2026

What does \b mean in regex?

\b matches a position, not a character — it's the difference between matching cat and matching cat inside catalog.

The short answer

\b is a word boundary. It matches the position between a word character (\w: letter, digit, or underscore) and a non-word character — or the start/end of the string if it's next to a word character.

It doesn't consume any character. It just asserts that the cursor is at a word boundary at that moment.

The classic use case

You want to find the word "cat" — not "catalog" or "cat-like" or "concat":

Pattern: \bcat\b
Input:   "The cat sat on the catalog"
Matches: "cat" (just the standalone one)

The first \b asserts "I'm at the start of a word" before the c. The second \b asserts "I'm at the end of a word" after the t. Inside "catalog", the position after t is between two word characters (t and a) — not a boundary — so the match fails.

Where boundaries actually are

Look at the string hello, world!. The word boundaries are at the positions marked with |:

|h e l l o|, |w o r l d|!
↑          ↑  ↑          ↑
boundary       boundary
positions

Five letters, then a comma — boundary between o and ,. Then a space, then w — boundary between space and w. Each transition between a word char and a non-word char is a boundary.

The Unicode gotcha

By default, \w only matches ASCII [A-Za-z0-9_]. So \b doesn't understand accented characters:

Pattern: \bcafé\b
Input:   "I love café food"
Match:   FAILS  (with default flags)

The é is NOT a word character in ASCII mode. The position after caf is a word boundary (because we transition from word f to non-word é), so the regex tries to match starting somewhere it shouldn't.

Fix this by enabling Unicode-aware matching:

JavaScript: add the u flag → /\bcafé\b/u
Python 3: Unicode is default — works automatically
PCRE: use /u or the (*UCP) directive

The opposite: \B

\B matches a non-boundary — a position that is NOT between a word char and a non-word char. Useful for finding substrings inside words:

Pattern: \Bcat\B
Input:   "The catalog is a cat"
Match:   "cat" inside "catalog"  (the standalone "cat" is excluded)

Common mistakes

Using \b around non-word characters

\b. at the start of a string is fine — \b matches the start before a word character. But \b! doesn't mean what you might think:

Pattern: \b!
Input:   "hello!"
Match:   YES (boundary between word "o" and non-word "!")

The boundary is between o and !, and \b matches that position. The ! then consumes the exclamation. That's probably what you want, but be aware: \b is about transitions, not about a specific side.

Expecting \b to work without word characters nearby

\b\b matches a single boundary position twice — it doesn't mean "two boundaries." Boundaries don't consume characters, so two in a row match the same position.

Practical patterns

Match a whole word only

\bword\b

Find numbers but not inside identifiers

Pattern: \b\d+\b
Input:   "count is 42 but x32 is also there"
Matches: ["42"]  (not "32" inside "x32")

Match the start of a word

\b[A-Z]   first letter, uppercase

The takeaway

\b is a position, not a character. It's the difference between matching a word and matching a word-shaped substring. Most "regex matched too much" bugs in word-search patterns are fixed by adding \b on both sides.

For Unicode text, make sure your flavor is in Unicode mode — otherwise \b won't see accented letters as word characters and your matches will be wrong.