Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Patterns May 12, 2026

How to match a URL with regex

A pattern that handles real-world URLs without becoming a 200-character monster.

The short answer

^https?://[^\s/$.?#].[^\s]*$

This is a simplified version of the well-known Diego Perini URL regex, suitable for general use. Open it in the explainer →

What it matches

  • https://example.com
  • http://sub.domain.com/path?query=1
  • https://example.com:8080/page#anchor
  • https://user:pass@example.com/path

What it rejects

  • example.com — no scheme
  • ftp://example.com — non-HTTP scheme
  • http:// — no host
  • http:// example.com — has whitespace

How it works

^
Start of string.
https?
"http" optionally followed by "s".
://
The scheme separator.
[^\s/$.?#]
The first character of the host — anything that's not whitespace, slash, dollar, dot, question, or hash. This rules out URLs like http:///path.
.
One more character.
[^\s]*
Zero or more non-whitespace characters — the rest of the URL.
$
End of string.

If you want all schemes

Drop the https? and accept any scheme:

^[a-zA-Z][a-zA-Z0-9+.-]*://[^\s/$.?#].[^\s]*$

If you want to extract URLs from text

Use without anchors and with the global flag:

const urls = text.match(/https?:\/\/[^\s]+/g);

Or in Python:

urls = re.findall(r"https?://[^\s]+", text)

Gotchas

Trailing punctuation

"Visit https://example.com." — the [^\s]+ grabs the trailing period because it's not whitespace. To exclude common trailing punctuation:

https?://[^\s]+?[^\s.,;:!?)]

Internationalized domain names

The pattern doesn't handle Unicode domain names like https://例え.jp. Most browsers convert these to https://xn--r8jz45g.jp (punycode) when sending, but in raw text they'll fail to match. For Unicode URLs:

^https?://[^\s/$.?#].[^\s]*$/u

(Add the u flag in JS, or use Unicode mode in Python.)

When to use a URL parser instead

For anything that touches security — URL allowlisting, SSRF prevention, redirect validation — do not rely on regex. Use a real URL parser:

  • JavaScript: new URL(str)
  • Python: urllib.parse.urlparse(str)
  • Java: java.net.URI

Parsers handle edge cases (percent-encoding, port numbers, IPv6 hosts, userinfo) that regex would need 200+ characters to match correctly — and even then would still miss some.

The takeaway

The pattern above is fine for "does this look like an HTTP URL." For programmatic URL handling, parse with a library. For URL extraction from prose, regex is the right tool — just be ready to trim trailing punctuation.


Related reading


Try this pattern in the explainer

Paste any regex into the live explainer and see what each token means, with example matches in real time.

Open the regex explainer →