Blog

Match content between two delimiters with regex

Extract text between <code>BEGIN</code> and <code>END</code> markers, or between brackets, or between any pair of delimiters. Quick regex recipe.

The basic pattern

/BEGIN([\s\S]*?)END/

The lazy *? stops at the first END after a BEGIN. The capture group contains what was in between.

Use [\s\S] instead of . if the content might span multiple lines (no need for the s/dotall flag this way).

Between specific symbols

// Between brackets [content]
/\[([^\]]*)\]/

// Between curly braces {content}
/\{([^}]*)\}/

// Between parentheses (content) — careful with nesting
/\(([^)]*)\)/

// Between two underscores _content_ (Markdown italic)
/_([^_]+)_/

Using a negated class [^X] works when the inner content can't contain X. For nested brackets, you need a parser, not regex.

Capture multiple occurrences

const text = "First [foo] then [bar] and [baz].";
const matches = [...text.matchAll(/\[([^\]]+)\]/g)];
matches.forEach(m => console.log(m[1]));
// "foo"
// "bar"
// "baz"

Greedy vs lazy

Watch the difference:

// Greedy — grabs as much as possible
"[a][b][c]".match(/\[(.+)\]/)?.[1];   // "a][b][c"

// Lazy — grabs as little as possible
"[a][b][c]".match(/\[(.+?)\]/)?.[1];  // "a"

For finding one occurrence at a time, lazy is usually what you want. For greedy "everything between the first and last delimiter", drop the ?.

Python

import re

BETWEEN_RE = re.compile(r"BEGIN(.*?)END", re.DOTALL)

for m in BETWEEN_RE.finditer(text):
    print(m.group(1))

For real nested content, use a parser

If you need to match balanced parens or nested HTML tags correctly, regex won't do it (except in PCRE/.NET with recursion). Use a real parser or a stack-based scan in code.

← Back to blog