Regular Expressions 101

Community Patterns

Your search did not match anything

Community Library Entry

1

Regular Expression
PCRE2 (PHP >=7.3)

/
(?: (?<alpha>[a-z]) (?<digit>[0-9]) (?<unreserved>\g<alpha>|\g<digit>|-|\.|_|~) (?<hexdig>\g<digit>|[A-F]) (?<pct_encoded>%\g<hexdig>{2}) (?<gen_delims>[:\/\?\#\[\]@]) (?<sub_delims>[!\$&'\(\)\*\+,;=]) (?<reserved>\g<gen_delims>|\g<sub_delims>) (?<ip_literal>\[(?:\g<ipv6address>|\g<ipvfuture>)\]) (?<ipvfuture>v\g<hexdig>+\.(?:\g<unreserved>|\g<sub_delims>|:)+) (?<ipv6address> (?:\g<h16>:){6}\g<ls32> | ::(?:\g<h16>:){5}\g<ls32> | (?: \g<h16>)?::(?:\g<h16>:){4}\g<ls32> | (?:(?:\g<h16>:){0,1}\g<h16>)?::(?:\g<h16>:){3}\g<ls32> | (?:(?:\g<h16>:){0,2}\g<h16>)?::(?:\g<h16>:){2}\g<ls32> | (?:(?:\g<h16>:){0,3}\g<h16>)?::(?:\g<h16>:){1}\g<ls32> | (?:(?:\g<h16>:){0,4}\g<h16>)?:: \g<ls32> | (?:(?:\g<h16>:){0,5}\g<h16>)?:: \g<h16> | (?:(?:\g<h16>:){0,6}\g<h16>)?:: ) (?<h16>\g<hexdig>{1,4}) (?<ls32>\g<h16>:\g<h16>|\g<ipv4address>) (?<ipv4address>\g<dec_octet>\.\g<dec_octet>\.\g<dec_octet>\.\g<dec_octet>) (?<dec_octet> 25[0-5] | # 250-255 2[0-4]\g<digit> | # 200-249 1\g<digit>{2} | # 100-199 [1-9]\g<digit> | # 10-99 \g<digit> # 0-9 ) (?<reg_name>(?:\g<unreserved>|\g<pct_encoded>|\g<sub_delims>)*) (?<path_abempty>(?:\/\g<segment>)*) (?<path_absoloute>\/(?:\g<segment_nz>(?:\/\g<segment>)*)) (?<path_noscheme>\g<segment_nz_nc>(?:\/\g<segment>)*) (?<path_rootless>\g<segment_nz>(?:\/\g<segment>)*) (?<path_empty>) (?<segment> \g<pchar>*) (?<segment_nz> \g<pchar>+) (?<segment_nz_nc>(?:\g<unreserved>|\g<pct_encoded>|\g<sub_delims>|@)+) # non-zero-length segment without any colon ":" (?<pchar>\g<unreserved>|\g<pct_encoded>|\g<sub_delims>|:|@) ){0} (?<scheme> \g<alpha> (?:\g<alpha>|\g<digit>|\+|-|\.)* ) : (?<heir_part> (?: \/\/ (?<authority> (?:(?<userinfo>(?:\g<unreserved>|\g<pct_encoded>|\g<sub_delims>|:)*)@)? (?<host>\g<ip_literal>|\g<ipv4address>|\g<reg_name>) (?::(?<port>\g<digit>*))? ) )? (?<path>(?(<authority>) \g<path_abempty> | # begins with "/" or is empty (?: \g<path_absoloute> | # begins with "/" but not "//" \g<path_rootless> | # begins with a segment \g<path_noscheme> | # begins with a non-colon segment \g<path_empty> # zero characters ) )) ) (?:\?(?<query>(?:\g<pchar>|\/|\?)*))? (?:\#(?<fragment>(?:\g<pchar>|\/|\?)*))?
/
gix

Description

Please not that this expression is for identifying parts of generic URI's, you may be looking to recognise URL's or http(s) addresses specifically.

Did my best to implement the capture groups laid out in RFC 3986. Priority was on readability, so likely not universally compatible between 'regex flavours'.

Removing the case sensitive flag (i) should detect only the canonical casing described in 3986.

Submitted by MathsStan - 16 days ago (Last modified 16 days ago)