Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
How-to May 4, 2026

Regex to match anything between two strings

The pattern is simple — but there are three ways to write it, and one of them is much better.

The short answer

To match text between START and END:

START(.*?)END        lazy quantifier, simple but can backtrack
START(.+?)END        same, requires at least one character
START([^E]*)END      negated class (faster, no backtracking)

The captured content is in group 1. Use the lazy version for clarity, the negated class for performance.

The three approaches

1. Lazy quantifier — readable

Pattern: BEGIN(.*?)END
Input:   "BEGINfirstENDmiddleBEGINsecondEND"
With g flag: matches "BEGINfirstEND" and "BEGINsecondEND"
Capture group 1: "first" and "second"

This is the most common form. .*? matches as little as possible — it stops at the first END it can find.

2. Negated character class — fast

If your delimiters are single characters (like quotes or brackets), use a negated class:

"([^"]*)"            content between double quotes
\[([^\]]*)\]         content between square brackets
\(([^)]*)\)          content between parens (single-level only)

This is significantly faster than lazy .*? because the engine never has to back off — it consumes characters that definitely belong in the match.

3. Lookarounds — clean output

If you want just the inner text without including the delimiters in the match itself:

Pattern: (?<=START).*?(?=END)
Result:  matches just the middle text, no capture group needed

The lookbehind and lookahead don't consume the delimiters. The match itself is just the inside.

When the delimiter is multi-character

For multi-character delimiters like <!-- and -->:

Lazy:        <!--(.*?)-->
Negated:     <!--((?:[^-]|-(?!-))*)-->     (much harder to read)

The negated-class approach gets complex with multi-char delimiters. Stick with the lazy quantifier unless you specifically need ReDoS resistance.

Multiline content

By default, . doesn't match newlines. If your content spans lines, enable the dotall flag:

  • JavaScript: /START(.*?)END/s (ES2018+)
  • Python: re.compile(r"START(.*?)END", re.DOTALL)
  • PCRE: /s modifier

Or use [\s\S] instead of . — that works without any flag:

START([\s\S]*?)END     works across newlines without flags

JavaScript example

const text = "[apple][banana][cherry]";
const matches = [...text.matchAll(/\[([^\]]+)\]/g)].map(m => m[1]);
// ["apple", "banana", "cherry"]

Python example

import re
text = "[apple][banana][cherry]"
matches = re.findall(r"\[([^\]]+)\]", text)
# ["apple", "banana", "cherry"]

Common mistakes

Forgetting to make it lazy

Pattern: BEGIN(.*)END
Input:   "BEGINfirstENDmiddleBEGINsecondEND"
Match:   the WHOLE thing — captures "firstENDmiddleBEGINsecond"

Greedy .* grabs everything to the last END. Use lazy .*? instead.

Not handling nested delimiters

Regex can't handle balanced nesting in general. \(.*?\) on input (a(b)c) matches (a(b), not (a(b)c). For nested structures, you need a parser, not a regex.

The takeaway

Use the lazy quantifier for readability. Use the negated character class for single-character delimiters when performance matters. Use lookarounds when you want the middle text without delimiters in the match.

For nested or recursive structures, regex isn't the right tool — switch to a parser.


Related reading


Try this pattern in the explainer

Paste any regex into the live explainer and see what each token means, with example matches in real time.

Open the regex explainer →