Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Guide

Regex in Java

Java's regex API is in java.util.regex. It's powerful and feature-rich, but the double-escaping in string literals trips up newcomers.

The double-escape problem

Java string literals interpret backslashes first. So a regex \d has to be written in source as "\\d".

// Match a 4-digit year
Pattern p = Pattern.compile("\\d{{4}}");

// vs the actual regex
\d{{4}}

Every \ in the regex becomes \\ in source. There's no raw-string equivalent in Java prior to text blocks. With Java 15+ text blocks ("""..."""), you can write more readable regexes, but inside a text block backslashes still need doubling (unlike Python's raw strings).

Pattern and Matcher

Java separates pattern compilation from matching state:

Pattern p = Pattern.compile("(\\d{{4}})-(\\d{{2}})");
Matcher m = p.matcher("2024-06");

if (m.matches()) {{                       // entire input
    System.out.println(m.group(1));      // "2024"
    System.out.println(m.group(2));      // "06"
}}

Methods on Matcher:

  • matches() — does the regex match the entire input?
  • find() — find the next match anywhere; call repeatedly.
  • lookingAt() — match at the start of input, not necessarily the end.
  • group(), group(n), group(name) — extract the match.
  • start(), end() — positions in the input.

Named groups

Java uses (?<name>...) without Python's P:

Pattern p = Pattern.compile("(?<year>\\d{{4}})-(?<month>\\d{{2}})");
Matcher m = p.matcher("2024-06");
if (m.matches()) {{
    String year = m.group("year");     // "2024"
}}

Names follow Java identifier rules: letters, digits, underscores, no starting digit.

Flags

Pass as a bitmask to compile:

Pattern.CASE_INSENSITIVE
Pattern.MULTILINE
Pattern.DOTALL
Pattern.UNICODE_CASE       // case-insensitive across Unicode
Pattern.UNICODE_CHARACTER_CLASS  // \w \d etc. follow Unicode
Pattern.COMMENTS           // verbose mode
Pattern.LITERAL            // treat pattern as literal text
Pattern.UNIX_LINES          // only \n is a line terminator

Combine with |: Pattern.compile(pat, Pattern.CASE_INSENSITIVE | Pattern.MULTILINE).

Substitutions

String result = "first-last".replaceAll("(\\w+)-(\\w+)", "$2 $1");
// "last first"

// Or with Matcher for capture-aware logic
Matcher m = Pattern.compile("\\d+").matcher("hello 3");
StringBuilder sb = new StringBuilder();
while (m.find()) {{
    m.appendReplacement(sb, String.valueOf(Integer.parseInt(m.group()) * 2));
}}
m.appendTail(sb);
// "hello 6"

Replacement uses $1, $2. $ and \ in replacement need escaping with \.

Quoting user input

To match a string literally, use Pattern.quote(userInput):

String safe = Pattern.quote(userInput);
Pattern p = Pattern.compile("prefix-" + safe);

Wraps the input in \Q...\E markers which tell the engine to treat everything inside as literal.

Atomic groups and possessive quantifiers

Java supports both natively:

(?>a+)b      atomic group
a++b         possessive +
a*+b         possessive *
a?+b         possessive ?

These don't backtrack into the matched portion, preventing ReDoS on patterns that would otherwise be vulnerable.

Performance: java.util.regex isn't RE2

Java uses a backtracking NFA, so the same ReDoS pitfalls apply as in PCRE, Python's re, and JavaScript. For input-validation regex on user data, prefer atomic groups / possessive quantifiers, or consider Google's RE2-J port for linear-time matching.


← Back to guides