Community Patterns

1

Paragraph Delimiter Counter (Unicode-Aware)

Created·2024-12-05 02:56
Updated·2024-12-05 03:24
Flavor·.NET 7.0 (C#)
Finds all paragraphs in the input text, where a paragraph is defined as any occurrence of a non-whitespace character immediately following any of the following and any other preceding whitespace: 2 or more consecutive CRLF sequences 2 or more consecutive CR characters 2 or more consecutive LF characters 1 or more Unicode Paragraph Separator class characters The beginning of the string (matches the first paragraph) Again, note that whitespace mixed in with the above will not interfere with the matching, as demonstrated by the test text included. This is intended to be used with the options specified, so be sure to include them for best performance (non-backtracking, multiline, non-capturing, invariant culture). This will work effectively on any version of .net that supports the included syntax. However, it is intended for use with .net8.0 and up, with the Regex.EnumerateMatches() method, or, more ideally, with .net9.0 and up, using the new Regex.EnumerateSplits() method, to avoid allocations associated with Match objects. Unicode paragraph separator characters are very rare in practice and support for them is almost non-existent in software, including the Windows Console. Windows Terminal, web browsers, the Windows clipboard, notepad, Visual Studio, and notepad++, all of which fail to handle it in their own ways, none of them actually adding a line when they occur (though notepad++ will show it as PS if you have enabled showing all whitespace). It is safe to remove |\p{Zp}+ from the pattern, if you do not wish to include those characters in your search. The resulting pattern, as a c# string, would be: "((\\r\\n|\\r|\\n){2,}|\\A)^\\s*\\S"
Submitted by dodexahedron

Community Library Entry

1

Regular Expression
Created·2020-03-22 22:40
Flavor·Golang

`
^(?:[_a-z0-9](?:[_a-z0-9-]{0,61}[a-z0-9])?\.)+(?:[a-z](?:[a-z0-9-]{0,61}[a-z0-9])?)?$
`
igm
Open regex in editor

Description

This regexp can be used to validate domain names in Golang. While it cannot enforce the 253 character limit (with optional trailing period not included) that can be easily done by a simple len(domain) <= 253 check as well.

This can be used as-is in other languages, even with RE2 regex engine. If positive lookbehind assertions are available, the character limit can be used.

Non-capturing groups are used.

Example validated domains (some may be invalid per TLD rules):

example.com _25._tcp.SRV.example punycoded-idna.xn--zckzah under_score.example

Example invalid domains:

192.0.2.1 has spaces.com easy,typo.example domain.escapes.invalid no_trailing_.invalid -leading-or-trailing-.hyphens.invalid

TLDs have more validation, the following will not validate:

example digit.1example underscore._example_com

but with a trailing period, the same rules as non-TLD are applied:

example. 192.0.2.1. digit.1example. underscore._example_com.

Submitted by Alexander Dupuy