Regex in Java
Java's regex API is in java.util.regex. It's powerful and feature-rich, but the double-escaping in string literals trips up newcomers.
The double-escape problem
Java string literals interpret backslashes first. So a regex \d has to be written in source as "\\d".
// Match a 4-digit year
Pattern p = Pattern.compile("\\d{{4}}");
// vs the actual regex
\d{{4}}
Every \ in the regex becomes \\ in source. There's no raw-string equivalent in Java prior to text blocks. With Java 15+ text blocks ("""..."""), you can write more readable regexes, but inside a text block backslashes still need doubling (unlike Python's raw strings).
Pattern and Matcher
Java separates pattern compilation from matching state:
Pattern p = Pattern.compile("(\\d{{4}})-(\\d{{2}})");
Matcher m = p.matcher("2024-06");
if (m.matches()) {{ // entire input
System.out.println(m.group(1)); // "2024"
System.out.println(m.group(2)); // "06"
}}
Methods on Matcher:
matches()— does the regex match the entire input?find()— find the next match anywhere; call repeatedly.lookingAt()— match at the start of input, not necessarily the end.group(),group(n),group(name)— extract the match.start(),end()— positions in the input.
Named groups
Java uses (?<name>...) without Python's P:
Pattern p = Pattern.compile("(?<year>\\d{{4}})-(?<month>\\d{{2}})");
Matcher m = p.matcher("2024-06");
if (m.matches()) {{
String year = m.group("year"); // "2024"
}}
Names follow Java identifier rules: letters, digits, underscores, no starting digit.
Flags
Pass as a bitmask to compile:
Pattern.CASE_INSENSITIVE
Pattern.MULTILINE
Pattern.DOTALL
Pattern.UNICODE_CASE // case-insensitive across Unicode
Pattern.UNICODE_CHARACTER_CLASS // \w \d etc. follow Unicode
Pattern.COMMENTS // verbose mode
Pattern.LITERAL // treat pattern as literal text
Pattern.UNIX_LINES // only \n is a line terminator
Combine with |: Pattern.compile(pat, Pattern.CASE_INSENSITIVE | Pattern.MULTILINE).
Substitutions
String result = "first-last".replaceAll("(\\w+)-(\\w+)", "$2 $1");
// "last first"
// Or with Matcher for capture-aware logic
Matcher m = Pattern.compile("\\d+").matcher("hello 3");
StringBuilder sb = new StringBuilder();
while (m.find()) {{
m.appendReplacement(sb, String.valueOf(Integer.parseInt(m.group()) * 2));
}}
m.appendTail(sb);
// "hello 6"
Replacement uses $1, $2. $ and \ in replacement need escaping with \.
Quoting user input
To match a string literally, use Pattern.quote(userInput):
String safe = Pattern.quote(userInput);
Pattern p = Pattern.compile("prefix-" + safe);
Wraps the input in \Q...\E markers which tell the engine to treat everything inside as literal.
Atomic groups and possessive quantifiers
Java supports both natively:
(?>a+)b atomic group
a++b possessive +
a*+b possessive *
a?+b possessive ?
These don't backtrack into the matched portion, preventing ReDoS on patterns that would otherwise be vulnerable.
Performance: java.util.regex isn't RE2
Java uses a backtracking NFA, so the same ReDoS pitfalls apply as in PCRE, Python's re, and JavaScript. For input-validation regex on user data, prefer atomic groups / possessive quantifiers, or consider Google's RE2-J port for linear-time matching.