Anchors and boundaries — pinning regex to specific positions

Anchors don't match characters — they match positions. Master them and you cut whole classes of regex bugs.

What is an anchor?

Most regex constructs match characters. The pattern \d matches a digit character. [a-z] matches a single lowercase letter. They consume input as the regex engine moves through the string.

An anchor is different. An anchor matches a position — a zero-width spot between characters. The engine looks at where it currently is, decides whether that spot satisfies the anchor, and either continues or fails. The anchor doesn't consume anything.

This is why ^ at the start of a pattern doesn't "skip the first character." It just says: "we must currently be at the start of the input."

The two everyday anchors: ^ and $

Every regex flavor has these:

^ — matches the position at the start of the input.
$ — matches the position at the end of the input.

Use them when you want to validate that a string is entirely a match, not just that it contains a match. Compare:

\d{3}        →  matches "abc123def" (the "123" inside)
^\d{3}$      →  matches "123" only — fails on "abc123def"

The /m (multiline) flag changes ^ and $

Without the multiline flag, ^ matches only the start of the entire input. With multiline mode on, ^ matches the start of each line — i.e., after every newline.

// Without /m
/^foo/      matches: "foo"
           doesn't match: "bar\nfoo"

// With /m
/^foo/m     matches: "foo"
           matches: "bar\nfoo"   (foo is at the start of a line)

This is the #1 source of "my regex works in testing but not in production" bugs. If your data has newlines, decide whether ^ means start of input or start of line, then use the flag accordingly.

The absolute anchors: \A, \Z, \z

Python and PCRE have anchors that mean "start/end of input regardless of multiline mode":

\A — start of input (always, even with multiline on)
\z — end of input (always)
\Z — end of input, or before a trailing newline (Python)

JavaScript does not have these. In JS, you can't express "start of input only" if you also need ^ to match line starts elsewhere. The workaround is to apply the multiline flag selectively or not at all.

Word boundaries: \b and \B

\b matches a position between a word character (\w: letter, digit, or underscore) and a non-word character. \B is the opposite — matches positions not between those two character types.

This is how you write "match the whole word foo, not foo as a prefix":

/foo/       matches "foo", but ALSO matches "foobar" and "afoo"
/\bfoo\b/  matches "foo" as a standalone word
            doesn't match "foobar"
            doesn't match "afoo"

What counts as a "word character"?

In most flavors by default, \w means [A-Za-z0-9_]. So \b uses ASCII letter/digit/underscore as the "word" side. This causes problems with Unicode:

/\bcafé\b/   in standard mode: \b doesn't recognize é
                                 as a word character.
                                 May or may not match "café" depending
                                 on what comes after.

To fix this, use Unicode mode (the u flag in JavaScript, re.UNICODE in Python). Then \w includes Unicode letters and \b understands them.

Practical patterns

"This string is exactly an integer"

^-?\d+$

Anchors force a complete match. The minus sign is optional, then one or more digits, then end-of-string. Without anchors, -?\d+ would match the "5" inside "abc5def".

"Find every standalone word 'log'"

\blog\b

Word boundaries prevent matching "login", "logout", "blog", "catalog".

"Match only if not preceded by a digit"

Boundaries alone can't do this. You need a lookbehind:

(?<!\d)foo

Lookbehinds, like anchors, are zero-width — they assert a condition about position without consuming input.

Common mistakes

Forgetting ^ and $ in form validation. "Email regex" without anchors will match the email inside a longer string of garbage. Always anchor validation regex.
Using ^ with multiline flag when you meant absolute start. Use \A in flavors that support it.
Assuming \b works with Unicode. Add the Unicode flag explicitly.
Putting \b next to a non-word character. \b! at the end of a word does work (transition from word to non-word) but \b\. at the start doesn't make sense — there's no word transition there.

The takeaway

Anchors and boundaries are the glue between your pattern and the surrounding context. They make the difference between "this string contains a date" and "this string IS a date." Most validation bugs come from using too few anchors; most search bugs come from using too many.

When you're writing a regex, ask: "Where in the input does this need to match?" The answer drives which anchors belong in the pattern.