Advanced character classes
Character classes are more powerful than [a-z]. They have POSIX names, Unicode categories, negation, intersection, and more — depending on the flavor.
The basics
Square brackets [...] match any one character listed inside. Ranges ([a-z]), individual characters ([abc]), and negation ([^abc]) all work. Inside the class, most metacharacters are literal — see the escaping guide.
Shorthand classes
These work in every modern flavor:
\d [0-9] digit
\w [A-Za-z0-9_] word character
\s [ \t\n\r\f\v] whitespace
\D \W \S — negated versions
Caveat: in some flavors (especially with the Unicode flag), \d matches all Unicode digits, not just ASCII. Behavior varies — check your language's docs.
POSIX character classes
POSIX-compliant engines (PCRE, Java, .NET, Python) accept named classes inside character class brackets:
[[:alpha:]] letters
[[:digit:]] digits
[[:alnum:]] letters + digits
[[:space:]] whitespace
[[:upper:]] uppercase
[[:lower:]] lowercase
[[:punct:]] punctuation
[[:xdigit:]] hex digits
[[:print:]] printable chars
JavaScript doesn't support POSIX names — use the shorthand or explicit ranges instead. [[:alpha:]] in JS would match literally those characters.
Unicode property classes
Modern engines support Unicode categories with \p{{...}}:
\p{{Letter}} any Unicode letter
\p{{Lowercase}} any lowercase letter
\p{{Number}} any digit (any script)
\p{{Punctuation}} any punctuation
\p{{Script=Greek}} Greek script
\p{{Script=Devanagari}} Devanagari (used for Hindi, Marathi, etc.)
\p{{Emoji}} any emoji
In JavaScript, you need the u flag: /\p{{Letter}}+/u. In Python, use regex module (not built-in re). PCRE and Java support it natively.
Indian developers — \p{{Script=Devanagari}} lets you match Hindi/Marathi/Sanskrit text correctly without listing every code point.
Class intersection and subtraction
Some flavors let you combine classes:
[a-z&&[^aeiou]] Java/Ruby — consonants only
[a-z--[aeiou]] .NET — subtraction
[[a-z]&&[^aeiou]] Unicode regex — alternative syntax
Useful when you want "letters except vowels" without enumerating every consonant. Note: not portable across flavors.
Negation with multiple categories
To negate a Unicode property, capitalize the P:
\P{{Letter}} any non-letter
\P{{Number}} any non-digit
To combine multiple negations, wrap them in a class:
[^\d\s] not a digit and not whitespace
The dot is almost a character class
The dot . matches any character except newline by default. With the dotall/single-line flag (s in JS/Python/PCRE, (?s) inline), it matches newlines too.
For "any character including newlines" without the flag, use [\s\S] — works everywhere.
Practical recipes
Match Latin letters only (no accents): [A-Za-z]+
Match letters including accents: [\p{{Letter}}]+ with /u flag
Match a name with hyphens and apostrophes: [\p{{Letter}}'\-]+
Match any visible character: \S+ or [\p{{Graph}}]+
Match only ASCII printable: [\x20-\x7E]+