Regular Expressions 101

Distinguish torrent files (series vs movies)

81

Regular Expression
Python

r"
^ # get the title of this movie or series (?P<title> [-\w'\"]+ # match separator to later replace into correct title (?P<separator> [ .] ) # note this *must* be lazy for the engine to work ltr not rtl (?: [-\w'\"]+\2 )*? ) # start of movie vs serie check (?: # if this is an episode, lets match the season # number one way or another. if not, the year # of the movie (?: # series. can be a lot prettier if we used perl regex... # make sure this is not just a number in the title followed by our separator. # like, iron man 3 2013 or my.fictional.24.series (?! \d+ \2 ) # now try to match the season number (?: s (?: eason \2? )? )? (?P<season> \d\d? ) # needed to validate the last token is a dot, or whatever. (?: e\d\d? (?:-e?\d\d?)? | x\d\d? )? | # this is likely a movie, match the year (?P<year> [(\]]?\d{4}[)\]]? ) ) # make sure this ends with the separator, otherwise we # might be in the middle of something like "1080p" (?=\2) | # if we get here, this is likely still a movie. # match until one of the keywords (?= BOXSET | XVID | DIVX | LIMITED | UNRATED | PROPER | DTS | AC3 | AAC | BLU[ -]?RAY | HD(?:TV|DVD) | (?:DVD|B[DR]|WEB)RIP | \d+p | [hx]\.?264 ) )
"
gimx

Description

A neat regex for finding out whether a given torrent name is a series or a movie.

Returns the full name of the series with the separator needed to make it pretty (ie, replace it with space or what you want). Also returns the season number or the year for the movie/series, depending on what was previously matched.

Submitted by Firas Dib - 10 years ago (Last modified 8 months ago)