Download Cheat sheet PDF 12 pages · syntax, editors, patterns, Unicode, performance, debugging
Blog

Parse a CSV line — when regex works and when it doesn't

CSV looks regex-parseable on the surface. It isn't — but there are simple cases where regex works fine.

The naive approach

"a,b,c".split(/,/);   // ["a", "b", "c"]

Works for the simple case. Doesn't work for quoted fields with commas:

"\"a,b\",c,d".split(",");
// Wrong: ["\"a", "b\"", "c", "d"]
// Right: ["a,b", "c", "d"]

The mostly-right regex

const CSV_FIELD = /(?:^|,)(?:"((?:[^"]|"")*)"|([^",]*))/g;

function parseCsvLine(line) {
  const fields = [];
  let m;
  while ((m = CSV_FIELD.exec(line)) !== null) {
    fields.push(m[1] !== undefined ? m[1].replace(/""/g, "\"") : m[2]);
  }
  return fields;
}

parseCsvLine('"a,b",c,"hello ""world""",d');
// ["a,b", "c", "hello \"world\"", "d"]

This handles: quoted fields with commas, embedded double quotes (escaped as ""), and unquoted fields.

What it doesn't handle: multi-line fields (a quoted field with embedded newlines spans multiple "lines"). For that, you need a state machine or a proper parser.

Just use a library

For anything beyond simple cases:

  • JavaScript: papaparse
  • Python: csv module (standard library)
  • Go: encoding/csv
  • Ruby: csv standard library

These handle RFC 4180 properly, including multi-line fields, different quote characters, and escaping.

Regex on CSV is a good example of a place where regex almost works. The 90% solution is easy; the 100% solution needs state. Don't fight it — use a parser.


← Back to blog