Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Blog

Extract the file extension from a filename with regex

Sounds simple. The tricky bit is what counts as "the extension" for files like archive.tar.gz.

Last extension only

function getExt(filename) {
  const m = filename.match(/\.([^.\\\/]+)$/);
  return m ? m[1] : "";
}

getExt("document.pdf");       // "pdf"
getExt("archive.tar.gz");     // "gz"
getExt("no-extension");       // ""
getExt(".hidden");            // ""  (no separate extension)

The [^.\\\/]+ avoids matching across directory separators (which would happen if you had a path like /home/user.name/file).

Multi-part extensions (tar.gz, tar.bz2)

function getCompoundExt(filename) {
  const m = filename.match(/(\.[^.\\\/]+(\.[^.\\\/]+)?)$/);
  return m ? m[1] : "";
}

getCompoundExt("archive.tar.gz");   // ".tar.gz"
getCompoundExt("backup.sql.bz2");   // ".sql.bz2"
getCompoundExt("file.txt");         // ".txt"

This grabs the last two extensions if both exist. Refine if you only want compound extensions for specific cases like .tar.gz and not .config.json.

Just the name without extension

function stripExt(filename) {
  return filename.replace(/\.[^.\\\/]+$/, "");
}

stripExt("document.pdf");         // "document"
stripExt("archive.tar.gz");       // "archive.tar"

Python

import os

name, ext = os.path.splitext("document.pdf")
# name = "document", ext = ".pdf"

Don't use regex for this in Python — os.path.splitext handles edge cases (hidden files, paths) for you.

Use language built-ins when available

Most languages have a stdlib function for this:

  • Python: os.path.splitext
  • Node.js: path.extname
  • Go: filepath.Ext
  • Java: FilenameUtils.getExtension (Apache Commons)

Regex works when you're processing strings outside a filesystem context, like log lines or CSV rows where the filename is part of a longer string.


← Back to blog