How to match a URL with regex
A pattern that handles real-world URLs without becoming a 200-character monster.
The short answer
^https?://[^\s/$.?#].[^\s]*$
This is a simplified version of the well-known Diego Perini URL regex, suitable for general use. Open it in the explainer →
What it matches
https://example.comhttp://sub.domain.com/path?query=1https://example.com:8080/page#anchorhttps://user:pass@example.com/path
What it rejects
example.com— no schemeftp://example.com— non-HTTP schemehttp://— no hosthttp:// example.com— has whitespace
How it works
^- Start of string.
https?- "http" optionally followed by "s".
://- The scheme separator.
[^\s/$.?#]- The first character of the host — anything that's not whitespace, slash, dollar, dot, question, or hash. This rules out URLs like
http:///path. .- One more character.
[^\s]*- Zero or more non-whitespace characters — the rest of the URL.
$- End of string.
If you want all schemes
Drop the https? and accept any scheme:
^[a-zA-Z][a-zA-Z0-9+.-]*://[^\s/$.?#].[^\s]*$
If you want to extract URLs from text
Use without anchors and with the global flag:
const urls = text.match(/https?:\/\/[^\s]+/g);
Or in Python:
urls = re.findall(r"https?://[^\s]+", text)
Gotchas
Trailing punctuation
"Visit https://example.com." — the [^\s]+ grabs the trailing period because it's not whitespace. To exclude common trailing punctuation:
https?://[^\s]+?[^\s.,;:!?)]
Internationalized domain names
The pattern doesn't handle Unicode domain names like https://例え.jp. Most browsers convert these to https://xn--r8jz45g.jp (punycode) when sending, but in raw text they'll fail to match. For Unicode URLs:
^https?://[^\s/$.?#].[^\s]*$/u
(Add the u flag in JS, or use Unicode mode in Python.)
When to use a URL parser instead
For anything that touches security — URL allowlisting, SSRF prevention, redirect validation — do not rely on regex. Use a real URL parser:
- JavaScript:
new URL(str) - Python:
urllib.parse.urlparse(str) - Java:
java.net.URI
Parsers handle edge cases (percent-encoding, port numbers, IPv6 hosts, userinfo) that regex would need 200+ characters to match correctly — and even then would still miss some.
The takeaway
The pattern above is fine for "does this look like an HTTP URL." For programmatic URL handling, parse with a library. For URL extraction from prose, regex is the right tool — just be ready to trim trailing punctuation.
Related reading
Try this pattern in the explainer
Paste any regex into the live explainer and see what each token means, with example matches in real time.
Open the regex explainer →