Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Blog

Extract @mentions and #hashtags from a tweet

Social media text is full of @mentions and #hashtags. Here are the regexes for extracting them.

Mentions

const MENTION_RE = /@([A-Za-z0-9_]{1,15})\b/g;

const tweet = "@elonmusk hey, what about @timcook?";
const mentions = [...tweet.matchAll(MENTION_RE)].map(m => m[1]);
// ["elonmusk", "timcook"]

The capture group extracts the handle without the @. Twitter/X usernames are 1–15 chars: letters, digits, underscore.

Hashtags

const HASHTAG_RE = /#([A-Za-z][A-Za-z0-9_]*)\b/g;

const text = "Loving the new #regex #cheatsheet for #Python";
const tags = [...text.matchAll(HASHTAG_RE)].map(m => m[1]);
// ["regex", "cheatsheet", "Python"]

Hashtag rule: must start with a letter (most platforms don't allow #123 as a valid tag).

Python version

import re

MENTION_RE = re.compile(r"@([A-Za-z0-9_]{1,15})\b")
HASHTAG_RE = re.compile(r"#([A-Za-z][A-Za-z0-9_]*)\b")

mentions = MENTION_RE.findall("@user1 hi @user2")
tags = HASHTAG_RE.findall("#regex #python")

Edge cases

Email addresses look like mentions: alice@example.com would falsely match @example. Anchor to a word boundary or non-word character before the @: (?:^|\s)@(\w+).

Hashtags with non-Latin scripts: Twitter supports #नमस्ते (Hindi) and other scripts. For Unicode support, use \p{L} with the u flag in JavaScript or the regex module in Python.

const HASHTAG_UNICODE = /#([\p{L}][\p{L}\p{N}_]*)/gu;

This matches Latin and Devanagari, Arabic, Chinese, etc.


← Back to blog