Community Patterns

1

Paragraph Delimiter Counter (Unicode-Aware)

Created·2024-12-05 02:56
Updated·2024-12-05 03:24
Flavor·.NET 7.0 (C#)
Finds all paragraphs in the input text, where a paragraph is defined as any occurrence of a non-whitespace character immediately following any of the following and any other preceding whitespace: 2 or more consecutive CRLF sequences 2 or more consecutive CR characters 2 or more consecutive LF characters 1 or more Unicode Paragraph Separator class characters The beginning of the string (matches the first paragraph) Again, note that whitespace mixed in with the above will not interfere with the matching, as demonstrated by the test text included. This is intended to be used with the options specified, so be sure to include them for best performance (non-backtracking, multiline, non-capturing, invariant culture). This will work effectively on any version of .net that supports the included syntax. However, it is intended for use with .net8.0 and up, with the Regex.EnumerateMatches() method, or, more ideally, with .net9.0 and up, using the new Regex.EnumerateSplits() method, to avoid allocations associated with Match objects. Unicode paragraph separator characters are very rare in practice and support for them is almost non-existent in software, including the Windows Console. Windows Terminal, web browsers, the Windows clipboard, notepad, Visual Studio, and notepad++, all of which fail to handle it in their own ways, none of them actually adding a line when they occur (though notepad++ will show it as PS if you have enabled showing all whitespace). It is safe to remove |\p{Zp}+ from the pattern, if you do not wish to include those characters in your search. The resulting pattern, as a c# string, would be: "((\\r\\n|\\r|\\n){2,}|\\A)^\\s*\\S"
Submitted by dodexahedron
1

INI Parser

Created·2024-04-03 08:20
Updated·2024-08-28 12:56
Flavor·.NET 7.0 (C#)
This regular expression has the following features that make it a convenient tool for working with text data. (?=\S) Positive lookahead, trims leading whitespace in text block. (? ... ) Group for text block, which can be comment, section, entry, or undefined string. (? ... ) Group for comment. Includes # or ;, then spaces (if any), then comment value. (?[#;]+) Group for comment opening characters (# or ;). (?:*) Non-capturing group for spaces, excluding newlines. (?.+) Group for value following comment opening characters. (? ... ) Group for section. Includes opening bracket [, then spaces (if any), then section value, then closing bracket ]. (?\[) Group for opening bracket [. (?:\s*) Non-capturing group for spaces after the opening bracket [. (?]*\S+) Group for the section value, excluding the closing bracket ] and capturing the last non-space character. (?:*) Non-capturing group for spaces after the section value before the closing bracket ]. (?\]) Group for the closing bracket ]. (? ... ) Group for an entry (parameter and its value). Includes the key, the separator (: or =), and the value. (?]*\S) Group for the entry key, excluding the =, [, ] and newline characters, and capturing the last non-space character. (?:*) Non-capturing group for spaces after the key before the separator (: or =). (?: =) __(?*)__ Group for the entry value, excluding #, ; and newline. (?:*) Non-capturing group for whitespace after the entry value. (?.+) Group for an undefined string that does not match any other rules. (?\r\n|\n) Group for newline characters. (?+) Group for whitespace characters, excluding newline characters.
Submitted by Pavel Bashkardin
1

Chinese Digits

Created·2024-01-05 07:56
Updated·2024-02-06 07:02
Flavor·PCRE (Legacy)
Match Chinese Digits less than 1×10^16, such as “一千两百三十四万”、“八萬点七六五”、“玖仟玖佰玖拾玖万玖仟玖佰玖拾玖亿玖仟玖佰玖拾玖万玖仟玖佰玖拾玖点玖玖玖玖玖玖玖玖玖玖玖玖玖玖玖玖”,Upper and lower case Chinese can be mixed, but Chinese numbers and English numbers cannot be mixed. Illegal numbers will not be matched. For example: “两十六” will not be matched, as the correct one should be “二十六”,In general Chinese, “两” and “十” are not used together; “两千零零六” will not be matched, as the correct one should be “两千零六”,as consecutive "零" in the integer part of Chinese numbers are illegal. It need a regex engine that supports the functionality of matching an expression defined in a named capture group, such as "(?[a-z]+)\d+(&letter)". 用于匹配小于1×10^16的中文数字,例如:“一千两百三十四万”、“八萬点七六五”、“玖仟玖佰玖拾玖万玖仟玖佰玖拾玖亿玖仟玖佰玖拾玖万玖仟玖佰玖拾玖点玖玖玖玖玖玖玖玖玖玖玖玖玖玖玖玖”,大小写中文数字可以混用,中文数字与英文数字不可以混用。 不合法的中文数字不会被匹配,例如:“两十六”、“两十六万”不会被匹配,因为中文习惯中不将“两”与“十”连用;“两千零零六”不会被匹配,因为其中有连续的零。 需要引擎支持引用已定义组的表达式,例如:"(?[a-z]+)\d+(&letter)"。
Submitted by anonymous

Community Library Entry

-2

Regular Expression
Created·2014-08-14 10:32
Flavor·ECMAScript (JavaScript)

/
^([1-9]{1,2}){1}(\.[0-9]{1,2})?$
/
Open regex in editor

Description

no description available

Submitted by Ved