Blog
Parse a CSV line — when regex works and when it doesn't
CSV looks regex-parseable on the surface. It isn't — but there are simple cases where regex works fine.
The naive approach
"a,b,c".split(/,/); // ["a", "b", "c"]
Works for the simple case. Doesn't work for quoted fields with commas:
"\"a,b\",c,d".split(",");
// Wrong: ["\"a", "b\"", "c", "d"]
// Right: ["a,b", "c", "d"]
The mostly-right regex
const CSV_FIELD = /(?:^|,)(?:"((?:[^"]|"")*)"|([^",]*))/g;
function parseCsvLine(line) {
const fields = [];
let m;
while ((m = CSV_FIELD.exec(line)) !== null) {
fields.push(m[1] !== undefined ? m[1].replace(/""/g, "\"") : m[2]);
}
return fields;
}
parseCsvLine('"a,b",c,"hello ""world""",d');
// ["a,b", "c", "hello \"world\"", "d"]
This handles: quoted fields with commas, embedded double quotes (escaped as ""), and unquoted fields.
What it doesn't handle: multi-line fields (a quoted field with embedded newlines spans multiple "lines"). For that, you need a state machine or a proper parser.
Just use a library
For anything beyond simple cases:
- JavaScript:
papaparse - Python:
csvmodule (standard library) - Go:
encoding/csv - Ruby:
csvstandard library
These handle RFC 4180 properly, including multi-line fields, different quote characters, and escaping.
Regex on CSV is a good example of a place where regex almost works. The 90% solution is easy; the 100% solution needs state. Don't fight it — use a parser.