Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Blog

Extract Indian PAN numbers from a document — Python and JavaScript

Finding all PAN numbers in a block of text is a one-liner with the right regex. Here's how, in both Python and JavaScript.

PAN format: 5 uppercase letters, then 4 digits, then 1 letter. Total 10 characters.

Python

import re

PAN_RE = re.compile(r"\b[A-Z]{5}[0-9]{4}[A-Z]\b")

text = """
The contact details are: ABCDE1234F (Rajesh), BNZAA2318J (Priya).
Phone: 9876543210, PAN: PQRST5678U.
"""

pans = PAN_RE.findall(text)
print(pans)  # ['ABCDE1234F', 'BNZAA2318J', 'PQRST5678U']

The \b word boundaries prevent matching inside longer strings like "ABCDE1234FGHIJK".

JavaScript

const PAN_RE = /\b[A-Z]{5}[0-9]{4}[A-Z]\b/g;

const text = `The contact details are: ABCDE1234F (Rajesh)...`;
const pans = text.match(PAN_RE);
console.log(pans);

Filter by entity type

The 4th letter of a PAN encodes the holder type:

  • P = Individual
  • C = Company
  • H = HUF (Hindu Undivided Family)
  • F = Partnership Firm
  • A = AOP (Association of Persons)
  • T = Trust
  • B = Body of Individuals
  • L = Local Authority
  • J = Artificial Juridical Person
  • G = Government
INDIVIDUAL_PAN = re.compile(r"\b[A-Z]{3}P[A-Z][0-9]{4}[A-Z]\b")
COMPANY_PAN = re.compile(r"\b[A-Z]{3}C[A-Z][0-9]{4}[A-Z]\b")

Useful when you want to extract only individuals or only companies from a mixed corpus.

Caveats

Regex confirms the format only. To verify a PAN is real, you'd need to call the Income Tax Department's PAN verification API. But for log-mining and document parsing, regex is enough.


← Back to blog