Regular Expressions 101

Community Patterns

Typographic correction for SI units

1

Regular Expression
ECMAScript (JavaScript)

/
\b(?<number>[\d,]+(?:\.\d+)?)(?<space> ?|&[^;]{2,7};)(?<dimensions>(?:(?<mathjax>\$[^\n\$]+\$)|(?<unit>(?:Y|Z|E|P|T|G|M|k|h|da|d|c|m|µ|n|p|f|a|z|y)?(?:m|g|s|A|K|mol|cd|Hz|N|Pa|J|W|C|V|F|Ω|S|Wb|T|H|lm|lx|Bq|Gy|Sv|kat|l|L))))
/
gm

Description

Matches quantities dimensioned by SI units in typical markdown-formatted plaintext documents. Replaces any spaces or HTML entities between the number and its dimensions with a "narrow non-breaking space" character. This character may fallback to a "thin space" due to its font support. The lexicographical convention for SI units is to use a full space, not a narrow space, but that was not the concern for this use case. This was submitted under PCRE by accident—it is for ECMAScript.

Submitted by ecfrechette - 3 months ago (Last modified 3 months ago)