Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Blog

Match quoted strings (with escape handling)

Matching a quoted string sounds easy until you remember escaped quotes inside. Here's how to do it right.

The simple version

/"[^"]*"/

Works for "hello" and "world". Fails for "he said \"hi\"" — the regex stops at the first inner ".

The escape-aware version

/"(?:[^"\\]|\\.)*"/

Breakdown:

  • " — opening quote
  • (?:[^"\\]|\\.)* — repeat: any char that isn't a quote or backslash, OR a backslash followed by anything
  • " — closing quote

The \\. alternative handles \" (escaped quote), \\ (escaped backslash), and any other backslash-escape.

Both single and double quotes

/(["\'])(?:[^\\]|\\.)*?\1/

The first group (['"]) captures whichever quote opened the string, and \1 requires the same quote to close it. The lazy *? stops at the first matching closing quote.

Python

import re

STRING_RE = re.compile(r'''(["'])(?:[^\\]|\\.)*?\1''')

for m in STRING_RE.finditer(source_code):
    print(m.group(0))

Caveat: multi-line strings

This regex doesn't cross newlines (without s/dotall) — it expects single-line strings. For Python triple-quoted strings or template literals with embedded newlines, you need a different pattern or, more practically, a proper tokenizer.

For full source-code parsing, regex isn't enough — use a language-aware lexer (Tree-sitter, your language's built-in tokenizer, etc.).


← Back to blog