Regular Expressions 101

Save & Share

Flavor

  • PCRE2 (PHP >=7.3)
  • PCRE (PHP <7.3)
  • ECMAScript (JavaScript)
  • Python
  • Golang
  • Java 8

Function

  • Match
  • Substitution
  • List
  • Unit Tests
"
^((?:http(?:s)?:\/\/)?)((?:www\.)?[a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b)((?:\:\d+)?)((?:[-\w@:%\+.~#&/=]*)?)((?:\?[-\w%\+.~#&=]*)?)$
"
img
^ asserts position at start of a line
1st Capturing Group
((?:http(?:s)?:\/\/)?)
Non-capturing group
(?:http(?:s)?:\/\/)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
http
matches the characters http literally (case insensitive)
Non-capturing group
(?:s)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
s matches the character s with index 11510 (7316 or 1638) literally (case insensitive)
: matches the character : with index 5810 (3A16 or 728) literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
2nd Capturing Group
((?:www\.)?[a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b)
Non-capturing group
(?:www\.)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
www
matches the characters www literally (case insensitive)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
Match a single character present in the list below
[a-zA-Z0-9@:%._\+~#=]
{2,256} matches the previous token between 2 and 256 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case insensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case insensitive)
@:%._
matches a single character in the list @:%._ (case insensitive)
\+ matches the character + with index 4310 (2B16 or 538) literally (case insensitive)
~#=
matches a single character in the list ~#= (case insensitive)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
Match a single character present in the list below
[a-z]
{2,6} matches the previous token between 2 and 6 times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case insensitive)
\b assert position at a word boundary: (^\w|\w$|\W\w|\w\W)
3rd Capturing Group
((?:\:\d+)?)
Non-capturing group
(?:\:\d+)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
\: matches the character : with index 5810 (3A16 or 728) literally (case insensitive)
\d
matches a digit (equivalent to [0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
4th Capturing Group
((?:[-\w@:%\+.~#&/=]*)?)
Non-capturing group
(?:[-\w@:%\+.~#&/=]*)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
Match a single character present in the list below
[-\w@:%\+.~#&/=]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
\w matches any word character (equivalent to [a-zA-Z0-9_])
@:%
matches a single character in the list @:% (case insensitive)
\+ matches the character + with index 4310 (2B16 or 538) literally (case insensitive)
.~#&/=
matches a single character in the list .~#&/= (case insensitive)
5th Capturing Group
((?:\?[-\w%\+.~#&=]*)?)
Non-capturing group
(?:\?[-\w%\+.~#&=]*)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
\? matches the character ? with index 6310 (3F16 or 778) literally (case insensitive)
Match a single character present in the list below
[-\w%\+.~#&=]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
\w matches any word character (equivalent to [a-zA-Z0-9_])
% matches the character % with index 3710 (2516 or 458) literally (case insensitive)
\+ matches the character + with index 4310 (2B16 or 538) literally (case insensitive)
.~#&=
matches a single character in the list .~#&= (case insensitive)
$ asserts position at the end of a line
Global pattern flags
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
g modifier: global. All matches (don't return after first match)
Your regular expression does not match the subject string.

Regular Expression
No Match

r"
"
img

Test String