Regular Expressions 101

Save & Share

Flavor

  • PCRE2 (PHP >=7.3)
  • PCRE (PHP <7.3)
  • ECMAScript (JavaScript)
  • Python
  • Golang
  • Java 8

Function

  • Match
  • Substitution
  • List
  • Unit Tests
"
^ # get the title of this movie or series (?P<title> [-\w'\"]+ # match separator to later replace into correct title (?P<separator> [ .] ) # note this *must* be lazy for the engine to work ltr not rtl (?: [-\w'\"]+\2 )*? ) # start of movie vs serie check (?: # if this is an episode, lets match the season # number one way or another. if not, the year # of the movie (?: # series. can be a lot prettier if we used perl regex... # make sure this is not just a number in the title followed by our separator. # like, iron man 3 2013 or my.fictional.24.series (?! \d+ \2 ) # now try to match the season number (?: s (?: eason \2? )? )? (?P<season> \d\d? ) # needed to validate the last token is a dot, or whatever. (?: e\d\d? (?:-e?\d\d?)? | x\d\d? )? | # this is likely a movie, match the year (?P<year> [(\]]?\d{4}[)\]]? ) ) # make sure this ends with the separator, otherwise we # might be in the middle of something like "1080p" (?=\2) | # if we get here, this is likely still a movie. # match until one of the keywords (?= BOXSET | XVID | DIVX | LIMITED | UNRATED | PROPER | DTS | AC3 | AAC | BLU[ -]?RAY | HD(?:TV|DVD) | (?:DVD|B[DR]|WEB)RIP | \d+p | [hx]\.?264 ) )
"
gimx
^ asserts position at start of a line
# get the title of this movie or series
get the title of this movie or series
Comment: get the title of this movie or series
Named Capture Group title
(?P<title> [-\w'\"]+ # match separator to later replace into correct title (?P<separator> [ .] ) # note this *must* be lazy for the engine to work ltr not rtl (?: [-\w'\"]+\2 )*? )
Match a single character present in the list below
[-\w'\"]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
\w matches any word character (equivalent to [a-zA-Z0-9_])
' matches the character ' with index 3910 (2716 or 478) literally (case insensitive)
\" matches the character " with index 3410 (2216 or 428) literally (case insensitive)
# match separator to later replace into correct title
match separator to later replace into correct title
Comment: match separator to later replace into correct title
Named Capture Group separator
(?P<separator> [ .] )
Match a single character present in the list below
[ .]
.
matches a single character in the list . (case insensitive)
# note this *must* be lazy for the engine to work ltr not rtl
note this *must* be lazy for the engine to work ltr not rtl
Comment: note this *must* be lazy for the engine to work ltr not rtl
Non-capturing group
(?: [-\w'\"]+\2 )*?
*? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Match a single character present in the list below
[-\w'\"]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
\w matches any word character (equivalent to [a-zA-Z0-9_])
' matches the character ' with index 3910 (2716 or 478) literally (case insensitive)
\" matches the character " with index 3410 (2216 or 428) literally (case insensitive)
\2 matches the same text as most recently matched by the 2nd capturing group
# start of movie vs serie check
start of movie vs serie check
Comment: start of movie vs serie check
Non-capturing group
(?: # if this is an episode, lets match the season # number one way or another. if not, the year # of the movie (?: # series. can be a lot prettier if we used perl regex... # make sure this is not just a number in the title followed by our separator. # like, iron man 3 2013 or my.fictional.24.series (?! \d+ \2 ) # now try to match the season number (?: s (?: eason \2? )? )? (?P<season> \d\d? ) # needed to validate the last token is a dot, or whatever. (?: e\d\d? (?:-e?\d\d?)? | x\d\d? )? | # this is likely a movie, match the year (?P<year> [(\]]?\d{4}[)\]]? ) ) # make sure this ends with the separator, otherwise we # might be in the middle of something like "1080p" (?=\2) | # if we get here, this is likely still a movie. # match until one of the keywords (?= BOXSET | XVID | DIVX | LIMITED | UNRATED | PROPER | DTS | AC3 | AAC | BLU[ -]?RAY | HD(?:TV|DVD) | (?:DVD|B[DR]|WEB)RIP | \d+p | [hx]\.?264 ) )
1st Alternative
# if this is an episode, lets match the season # number one way or another. if not, the year # of the movie (?: # series. can be a lot prettier if we used perl regex... # make sure this is not just a number in the title followed by our separator. # like, iron man 3 2013 or my.fictional.24.series (?! \d+ \2 ) # now try to match the season number (?: s (?: eason \2? )? )? (?P<season> \d\d? ) # needed to validate the last token is a dot, or whatever. (?: e\d\d? (?:-e?\d\d?)? | x\d\d? )? | # this is likely a movie, match the year (?P<year> [(\]]?\d{4}[)\]]? ) ) # make sure this ends with the separator, otherwise we # might be in the middle of something like "1080p" (?=\2)
# if this is an episode, lets match the season
if this is an episode, lets match the season
Comment: if this is an episode, lets match the season
# number one way or another. if not, the year
number one way or another. if not, the year
Comment: number one way or another. if not, the year
# of the movie
of the movie
Comment: of the movie
Non-capturing group
(?: # series. can be a lot prettier if we used perl regex... # make sure this is not just a number in the title followed by our separator. # like, iron man 3 2013 or my.fictional.24.series (?! \d+ \2 ) # now try to match the season number (?: s (?: eason \2? )? )? (?P<season> \d\d? ) # needed to validate the last token is a dot, or whatever. (?: e\d\d? (?:-e?\d\d?)? | x\d\d? )? | # this is likely a movie, match the year (?P<year> [(\]]?\d{4}[)\]]? ) )
1st Alternative
# series. can be a lot prettier if we used perl regex... # make sure this is not just a number in the title followed by our separator. # like, iron man 3 2013 or my.fictional.24.series (?! \d+ \2 ) # now try to match the season number (?: s (?: eason \2? )? )? (?P<season> \d\d? ) # needed to validate the last token is a dot, or whatever. (?: e\d\d? (?:-e?\d\d?)? | x\d\d? )?
# series. can be a lot prettier if we used perl regex...
series. can be a lot prettier if we used perl regex...
Comment: series. can be a lot prettier if we used perl regex...
# make sure this is not just a number in the title followed by our separator.
make sure this is not just a number in the title followed by our separator.
Comment: make sure this is not just a number in the title followed by our separator.
# like, iron man 3 2013 or my.fictional.24.series
like, iron man 3 2013 or my.fictional.24.series
Comment: like, iron man 3 2013 or my.fictional.24.series
Negative Lookahead
(?! \d+ \2 )
Assert that the Regex below does not match
\d
matches a digit (equivalent to [0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\2 matches the same text as most recently matched by the 2nd capturing group
# now try to match the season number
now try to match the season number
Comment: now try to match the season number
Non-capturing group
(?: s (?: eason \2? )? )?
Named Capture Group season
(?P<season> \d\d? )
# needed to validate the last token is a dot, or whatever.
needed to validate the last token is a dot, or whatever.
Non-capturing group
(?: e\d\d? (?:-e?\d\d?)? | x\d\d? )?
2nd Alternative
# this is likely a movie, match the year (?P<year> [(\]]?\d{4}[)\]]? )
# this is likely a movie, match the year
this is likely a movie, match the year
Comment: this is likely a movie, match the year
Named Capture Group year
(?P<year> [(\]]?\d{4}[)\]]? )
Match a single character present in the list below
[(\]]
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
( matches the character ( with index 4010 (2816 or 508) literally (case insensitive)
\] matches the character ] with index 9310 (5D16 or 1358) literally (case insensitive)
\d
matches a digit (equivalent to [0-9])
{4} matches the previous token exactly 4 times
Match a single character present in the list below
[)\]]
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
) matches the character ) with index 4110 (2916 or 518) literally (case insensitive)
\] matches the character ] with index 9310 (5D16 or 1358) literally (case insensitive)
# make sure this ends with the separator, otherwise we
make sure this ends with the separator, otherwise we
Comment: make sure this ends with the separator, otherwise we
# might be in the middle of something like "1080p"
might be in the middle of something like "1080p"
Comment: might be in the middle of something like "1080p"
Positive Lookahead
(?=\2)
Assert that the Regex below matches
\2 matches the same text as most recently matched by the 2nd capturing group
2nd Alternative
# if we get here, this is likely still a movie. # match until one of the keywords (?= BOXSET | XVID | DIVX | LIMITED | UNRATED | PROPER | DTS | AC3 | AAC | BLU[ -]?RAY | HD(?:TV|DVD) | (?:DVD|B[DR]|WEB)RIP | \d+p | [hx]\.?264 )
# if we get here, this is likely still a movie.
if we get here, this is likely still a movie.
Comment: if we get here, this is likely still a movie.
# match until one of the keywords
match until one of the keywords
Comment: match until one of the keywords
Positive Lookahead
(?= BOXSET | XVID | DIVX | LIMITED | UNRATED | PROPER | DTS | AC3 | AAC | BLU[ -]?RAY | HD(?:TV|DVD) | (?:DVD|B[DR]|WEB)RIP | \d+p | [hx]\.?264 )
Assert that the Regex below matches
1st Alternative
BOXSET
BOXSET
matches the characters BOXSET literally (case insensitive)
2nd Alternative
XVID
XVID
matches the characters XVID literally (case insensitive)
3rd Alternative
DIVX
DIVX
matches the characters DIVX literally (case insensitive)
4th Alternative
LIMITED
LIMITED
matches the characters LIMITED literally (case insensitive)
5th Alternative
UNRATED
UNRATED
matches the characters UNRATED literally (case insensitive)
6th Alternative
PROPER
7th Alternative
DTS
8th Alternative
AC3
9th Alternative
AAC
10th Alternative
BLU[ -]?RAY
11th Alternative
HD(?:TV|DVD)
12th Alternative
(?:DVD|B[DR]|WEB)RIP
13th Alternative
\d+p
14th Alternative
 [hx]\.?264
Global pattern flags
g modifier: global. All matches (don't return after first match)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
x modifier: extended. Spaces and text after a # in the pattern are ignored
Your regular expression does not match the subject string.

Regular Expression
No Match

r"
"
gimx

Test String