Patterns May 12, 2026

How to match a URL with regex

A pattern that handles real-world URLs without becoming a 200-character monster.

The short answer

^https?://[^\s/$.?#].[^\s]*$

This is a simplified version of the well-known Diego Perini URL regex, suitable for general use. Open it in the explainer →

What it matches

https://example.com
http://sub.domain.com/path?query=1
https://example.com:8080/page#anchor
https://user:pass@example.com/path

What it rejects

example.com — no scheme
ftp://example.com — non-HTTP scheme
http:// — no host
http:// example.com — has whitespace

How it works

^: Start of string.
https?: "http" optionally followed by "s".
://: The scheme separator.
[^\s/$.?#]: The first character of the host — anything that's not whitespace, slash, dollar, dot, question, or hash. This rules out URLs like http:///path.
.: One more character.
[^\s]*: Zero or more non-whitespace characters — the rest of the URL.
$: End of string.

If you want all schemes

Drop the https? and accept any scheme:

^[a-zA-Z][a-zA-Z0-9+.-]*://[^\s/$.?#].[^\s]*$

If you want to extract URLs from text

Use without anchors and with the global flag:

const urls = text.match(/https?:\/\/[^\s]+/g);

Or in Python:

urls = re.findall(r"https?://[^\s]+", text)

Gotchas

Trailing punctuation

"Visit https://example.com." — the [^\s]+ grabs the trailing period because it's not whitespace. To exclude common trailing punctuation:

https?://[^\s]+?[^\s.,;:!?)]

Internationalized domain names

The pattern doesn't handle Unicode domain names like https://例え.jp. Most browsers convert these to https://xn--r8jz45g.jp (punycode) when sending, but in raw text they'll fail to match. For Unicode URLs:

^https?://[^\s/$.?#].[^\s]*$/u

(Add the u flag in JS, or use Unicode mode in Python.)

When to use a URL parser instead

For anything that touches security — URL allowlisting, SSRF prevention, redirect validation — do not rely on regex. Use a real URL parser:

JavaScript: new URL(str)
Python: urllib.parse.urlparse(str)
Java: java.net.URI

Parsers handle edge cases (percent-encoding, port numbers, IPv6 hosts, userinfo) that regex would need 200+ characters to match correctly — and even then would still miss some.

The takeaway

The pattern above is fine for "does this look like an HTTP URL." For programmatic URL handling, parse with a library. For URL extraction from prose, regex is the right tool — just be ready to trim trailing punctuation.