Community Patterns

Community Library Entry

1

Regular Expression
Created·2025-05-30 18:40
Updated·2025-05-30 18:44
Flavor·PCRE2 (PHP)

/
(?: (?<alpha>[a-z]) (?<digit>[0-9]) (?<unreserved>\g<alpha>|\g<digit>|-|\.|_|~) (?<hexdig>\g<digit>|[A-F]) (?<pct_encoded>%\g<hexdig>{2}) (?<gen_delims>[:\/\?\#\[\]@]) (?<sub_delims>[!\$&'\(\)\*\+,;=]) (?<reserved>\g<gen_delims>|\g<sub_delims>) (?<ip_literal>\[(?:\g<ipv6address>|\g<ipvfuture>)\]) (?<ipvfuture>v\g<hexdig>+\.(?:\g<unreserved>|\g<sub_delims>|:)+) (?<ipv6address> (?:\g<h16>:){6}\g<ls32> | ::(?:\g<h16>:){5}\g<ls32> | (?: \g<h16>)?::(?:\g<h16>:){4}\g<ls32> | (?:(?:\g<h16>:){0,1}\g<h16>)?::(?:\g<h16>:){3}\g<ls32> | (?:(?:\g<h16>:){0,2}\g<h16>)?::(?:\g<h16>:){2}\g<ls32> | (?:(?:\g<h16>:){0,3}\g<h16>)?::(?:\g<h16>:){1}\g<ls32> | (?:(?:\g<h16>:){0,4}\g<h16>)?:: \g<ls32> | (?:(?:\g<h16>:){0,5}\g<h16>)?:: \g<h16> | (?:(?:\g<h16>:){0,6}\g<h16>)?:: ) (?<h16>\g<hexdig>{1,4}) (?<ls32>\g<h16>:\g<h16>|\g<ipv4address>) (?<ipv4address>\g<dec_octet>\.\g<dec_octet>\.\g<dec_octet>\.\g<dec_octet>) (?<dec_octet> 25[0-5] | # 250-255 2[0-4]\g<digit> | # 200-249 1\g<digit>{2} | # 100-199 [1-9]\g<digit> | # 10-99 \g<digit> # 0-9 ) (?<reg_name>(?:\g<unreserved>|\g<pct_encoded>|\g<sub_delims>)*) (?<path_abempty>(?:\/\g<segment>)*) (?<path_absoloute>\/(?:\g<segment_nz>(?:\/\g<segment>)*)) (?<path_noscheme>\g<segment_nz_nc>(?:\/\g<segment>)*) (?<path_rootless>\g<segment_nz>(?:\/\g<segment>)*) (?<path_empty>) (?<segment> \g<pchar>*) (?<segment_nz> \g<pchar>+) (?<segment_nz_nc>(?:\g<unreserved>|\g<pct_encoded>|\g<sub_delims>|@)+) # non-zero-length segment without any colon ":" (?<pchar>\g<unreserved>|\g<pct_encoded>|\g<sub_delims>|:|@) ){0} (?<scheme> \g<alpha> (?:\g<alpha>|\g<digit>|\+|-|\.)* ) : (?<heir_part> (?: \/\/ (?<authority> (?:(?<userinfo>(?:\g<unreserved>|\g<pct_encoded>|\g<sub_delims>|:)*)@)? (?<host>\g<ip_literal>|\g<ipv4address>|\g<reg_name>) (?::(?<port>\g<digit>*))? ) )? (?<path>(?(<authority>) \g<path_abempty> | # begins with "/" or is empty (?: \g<path_absoloute> | # begins with "/" but not "//" \g<path_rootless> | # begins with a segment \g<path_noscheme> | # begins with a non-colon segment \g<path_empty> # zero characters ) )) ) (?:\?(?<query>(?:\g<pchar>|\/|\?)*))? (?:\#(?<fragment>(?:\g<pchar>|\/|\?)*))?
/
gix
Open regex in editor

Description

Please not that this expression is for identifying parts of generic URI's, you may be looking to recognise URL's or http(s) addresses specifically.

Did my best to implement the capture groups laid out in RFC 3986. Priority was on readability, so likely not universally compatible between 'regex flavours'.

Removing the case sensitive flag (i) should detect only the canonical casing described in 3986.

Submitted by MathsStan