Match quoted strings (with escape handling)
Matching a quoted string sounds easy until you remember escaped quotes inside. Here's how to do it right.
The simple version
/"[^"]*"/
Works for "hello" and "world". Fails for "he said \"hi\"" — the regex stops at the first inner ".
The escape-aware version
/"(?:[^"\\]|\\.)*"/
Breakdown:
"— opening quote(?:[^"\\]|\\.)*— repeat: any char that isn't a quote or backslash, OR a backslash followed by anything"— closing quote
The \\. alternative handles \" (escaped quote), \\ (escaped backslash), and any other backslash-escape.
Both single and double quotes
/(["\'])(?:[^\\]|\\.)*?\1/
The first group (['"]) captures whichever quote opened the string, and \1 requires the same quote to close it. The lazy *? stops at the first matching closing quote.
Python
import re
STRING_RE = re.compile(r'''(["'])(?:[^\\]|\\.)*?\1''')
for m in STRING_RE.finditer(source_code):
print(m.group(0))
Caveat: multi-line strings
This regex doesn't cross newlines (without s/dotall) — it expects single-line strings. For Python triple-quoted strings or template literals with embedded newlines, you need a different pattern or, more practically, a proper tokenizer.
For full source-code parsing, regex isn't enough — use a language-aware lexer (Tree-sitter, your language's built-in tokenizer, etc.).