Regular Expressions 101

Community Library Entry

1

Regular Expression
.NET 7.0 (C#)

@"
(?=\S)(?<text>(?<comment>(?<open>[#;]+)(?:[^\S\r\n]*)(?<value>.+))|(?<section>(?<open>\[)(?:\s*)(?<value>[^\]]*\S+)(?:[^\S\r\n]*)(?<close>\]))|(?<entry>(?<key>[^=\r\n\[\]]*\S)(?:[^\S\r\n]*)(?<delimiter>:|=)(?:[^\S\r\n]*)(?<value>[^#;\r\n]*))|(?<undefined>.+))(?<=\S)|(?<linebreaker>\r\n|\n)|(?<whitespace>[^\S\r\n]+)
"
ig

Description

This regular expression has the following features that make it a convenient tool for working with text data.

(?=\S) Positive lookahead, trims leading whitespace in text block. (?<text> ... ) Group for text block, which can be comment, section, entry, or undefined string. (?<comment> ... ) Group for comment. Includes # or ;, then spaces (if any), then comment value. (?<open>[#;]+) Group for comment opening characters (# or ;). (?:[^\S\r\n]*) Non-capturing group for spaces, excluding newlines. (?<value>.+) Group for value following comment opening characters. (?<section> ... ) Group for section. Includes opening bracket [, then spaces (if any), then section value, then closing bracket ]. (?<open>[) Group for opening bracket [. (?:\s*) Non-capturing group for spaces after the opening bracket [. (?<value>[^]]*\S+) Group for the section value, excluding the closing bracket ] and capturing the last non-space character. (?:[^\S\r\n]*) Non-capturing group for spaces after the section value before the closing bracket ]. (?<close>]) Group for the closing bracket ]. (?<entry> ... ) Group for an entry (parameter and its value). Includes the key, the separator (: or =), and the value. (?<key>[^=\r\n[]]*\S) Group for the entry key, excluding the =, [, ] and newline characters, and capturing the last non-space character. (?:[^\S\r\n]*) Non-capturing group for spaces after the key before the separator (: or =). (?<delimiter>: =) (?<value>[^#;\r\n]*) Group for the entry value, excluding #, ; and newline. (?:[^\S\r\n]*) Non-capturing group for whitespace after the entry value. (?<undefined>.+) Group for an undefined string that does not match any other rules. (?<=\S) Positive lookahead, trims whitespace at the end of a text block. `(?<linebreaker>\r\n|\n) Group for newline characters. (?<whitespace>[^\S\r\n]+) Group for whitespace characters, excluding newline characters.

Submitted by Pavel Bashkardin - 6 months ago (Last modified a month ago)