Guide

Advanced character classes

Character classes are more powerful than [a-z]. They have POSIX names, Unicode categories, negation, intersection, and more — depending on the flavor.

The basics

Square brackets [...] match any one character listed inside. Ranges ([a-z]), individual characters ([abc]), and negation ([^abc]) all work. Inside the class, most metacharacters are literal — see the escaping guide.

Shorthand classes

These work in every modern flavor:

\d   [0-9]                  digit
\w   [A-Za-z0-9_]            word character
\s   [ \t\n\r\f\v]          whitespace
\D \W \S — negated versions

Caveat: in some flavors (especially with the Unicode flag), \d matches all Unicode digits, not just ASCII. Behavior varies — check your language's docs.

POSIX character classes

POSIX-compliant engines (PCRE, Java, .NET, Python) accept named classes inside character class brackets:

[[:alpha:]]    letters
[[:digit:]]    digits
[[:alnum:]]    letters + digits
[[:space:]]    whitespace
[[:upper:]]    uppercase
[[:lower:]]    lowercase
[[:punct:]]    punctuation
[[:xdigit:]]   hex digits
[[:print:]]    printable chars

JavaScript doesn't support POSIX names — use the shorthand or explicit ranges instead. [[:alpha:]] in JS would match literally those characters.

Unicode property classes

Modern engines support Unicode categories with \p{{...}}:

\p{{Letter}}        any Unicode letter
\p{{Lowercase}}     any lowercase letter
\p{{Number}}        any digit (any script)
\p{{Punctuation}}   any punctuation
\p{{Script=Greek}}  Greek script
\p{{Script=Devanagari}}  Devanagari (used for Hindi, Marathi, etc.)
\p{{Emoji}}         any emoji

In JavaScript, you need the u flag: /\p{{Letter}}+/u. In Python, use regex module (not built-in re). PCRE and Java support it natively.

Indian developers — \p{{Script=Devanagari}} lets you match Hindi/Marathi/Sanskrit text correctly without listing every code point.

Class intersection and subtraction

Some flavors let you combine classes:

[a-z&&[^aeiou]]   Java/Ruby — consonants only
[a-z--[aeiou]]    .NET — subtraction
[[a-z]&&[^aeiou]]  Unicode regex — alternative syntax

Useful when you want "letters except vowels" without enumerating every consonant. Note: not portable across flavors.

Negation with multiple categories

To negate a Unicode property, capitalize the P:

\P{{Letter}}     any non-letter
\P{{Number}}     any non-digit

To combine multiple negations, wrap them in a class:

[^\d\s]   not a digit and not whitespace

The dot is almost a character class

The dot . matches any character except newline by default. With the dotall/single-line flag (s in JS/Python/PCRE, (?s) inline), it matches newlines too.

For "any character including newlines" without the flag, use [\s\S] — works everywhere.

Practical recipes

Match Latin letters only (no accents): [A-Za-z]+

Match letters including accents: [\p{{Letter}}]+ with /u flag

Match a name with hyphens and apostrophes: [\p{{Letter}}'\-]+

Match any visible character: \S+ or [\p{{Graph}}]+

Match only ASCII printable: [\x20-\x7E]+

← Back to guides