Community Patterns

1

Paragraph Delimiter Counter (Unicode-Aware)

Created·2024-12-05 02:56
Updated·2024-12-05 03:24
Flavor·.NET 7.0 (C#)
Finds all paragraphs in the input text, where a paragraph is defined as any occurrence of a non-whitespace character immediately following any of the following and any other preceding whitespace: 2 or more consecutive CRLF sequences 2 or more consecutive CR characters 2 or more consecutive LF characters 1 or more Unicode Paragraph Separator class characters The beginning of the string (matches the first paragraph) Again, note that whitespace mixed in with the above will not interfere with the matching, as demonstrated by the test text included. This is intended to be used with the options specified, so be sure to include them for best performance (non-backtracking, multiline, non-capturing, invariant culture). This will work effectively on any version of .net that supports the included syntax. However, it is intended for use with .net8.0 and up, with the Regex.EnumerateMatches() method, or, more ideally, with .net9.0 and up, using the new Regex.EnumerateSplits() method, to avoid allocations associated with Match objects. Unicode paragraph separator characters are very rare in practice and support for them is almost non-existent in software, including the Windows Console. Windows Terminal, web browsers, the Windows clipboard, notepad, Visual Studio, and notepad++, all of which fail to handle it in their own ways, none of them actually adding a line when they occur (though notepad++ will show it as PS if you have enabled showing all whitespace). It is safe to remove |\p{Zp}+ from the pattern, if you do not wish to include those characters in your search. The resulting pattern, as a c# string, would be: "((\\r\\n|\\r|\\n){2,}|\\A)^\\s*\\S"
Submitted by dodexahedron
1

INI Parser

Created·2024-04-03 08:20
Updated·2024-08-28 12:56
Flavor·.NET 7.0 (C#)
This regular expression has the following features that make it a convenient tool for working with text data. (?=\S) Positive lookahead, trims leading whitespace in text block. (? ... ) Group for text block, which can be comment, section, entry, or undefined string. (? ... ) Group for comment. Includes # or ;, then spaces (if any), then comment value. (?[#;]+) Group for comment opening characters (# or ;). (?:*) Non-capturing group for spaces, excluding newlines. (?.+) Group for value following comment opening characters. (? ... ) Group for section. Includes opening bracket [, then spaces (if any), then section value, then closing bracket ]. (?\[) Group for opening bracket [. (?:\s*) Non-capturing group for spaces after the opening bracket [. (?]*\S+) Group for the section value, excluding the closing bracket ] and capturing the last non-space character. (?:*) Non-capturing group for spaces after the section value before the closing bracket ]. (?\]) Group for the closing bracket ]. (? ... ) Group for an entry (parameter and its value). Includes the key, the separator (: or =), and the value. (?]*\S) Group for the entry key, excluding the =, [, ] and newline characters, and capturing the last non-space character. (?:*) Non-capturing group for spaces after the key before the separator (: or =). (?: =) __(?*)__ Group for the entry value, excluding #, ; and newline. (?:*) Non-capturing group for whitespace after the entry value. (?.+) Group for an undefined string that does not match any other rules. (?\r\n|\n) Group for newline characters. (?+) Group for whitespace characters, excluding newline characters.
Submitted by Pavel Bashkardin
1

UNC Path Component Validation - Sharename

Created·2023-01-02 01:10
Flavor·.NET 7.0 (C#)
There are specific requirements for the different components of a UNC path. While it's possible to try and validate the entire UNC path in one go, you quite often end up needing to break the path down anyway unless simply handing it off to some other library (in which case they should be providing validation for you!) and it's a lot simpler to validate the components individually. Unfortunately, there can be quite a lot of flexibility with UNC paths, and Microsoft's Type Definition only goes so far, as it says that the format can depend on the protocol. Thier type definition says that a share name must be 1-80 characters long, and the allowed characters are (hexidecimal ASCII/Unicode/UTF-8) x20-21, x23-29, x2B-2E, x30-39, x40-5A, x5E-7B, and x7D-FF. Careful observers may note that this actually includes some non-printable characters, notably x7F, the Delete code. I haven't had a chance to test whether you can actually create a share that includes it though. My use case is UNC for SMB/CIFS, so I checked the SMB/CIFS protocol definitions and they refer back to the UNC definition, but also the Microsoft File System Control Codes definition for share names. Rather than listing what is allowed, it lists what is illegal, however they equate to exactly the same. It specifically says "All other Unicode characters are legal". Therefore, I've taken an approach of firstly looking ahead to count the characters - you may want to remove this group from the RegEx if you check the length prior, and then checking the list of non-allowed characters, including all control codes. This may not be quite right according to the standards, however it is practical.
Submitted by thejamesdecker

Community Library Entry

0

Regular Expression
Created·2021-02-04 19:55
Flavor·PCRE2 (PHP)

/
^(personal-loan|yourmortgageapp|lending).wf.com|^wellsfargo.com|^apply.wellsfargo.com
/
gm
Open regex in editor

Description

no description available

Submitted by anonymous