Community Patterns

1

Chinese Digits

Created·2024-01-05 07:56
Updated·2024-02-06 07:02
Flavor·PCRE (Legacy)
Match Chinese Digits less than 1×10^16, such as “一千两百三十四万”、“八萬点七六五”、“玖仟玖佰玖拾玖万玖仟玖佰玖拾玖亿玖仟玖佰玖拾玖万玖仟玖佰玖拾玖点玖玖玖玖玖玖玖玖玖玖玖玖玖玖玖玖”,Upper and lower case Chinese can be mixed, but Chinese numbers and English numbers cannot be mixed. Illegal numbers will not be matched. For example: “两十六” will not be matched, as the correct one should be “二十六”,In general Chinese, “两” and “十” are not used together; “两千零零六” will not be matched, as the correct one should be “两千零六”,as consecutive "零" in the integer part of Chinese numbers are illegal. It need a regex engine that supports the functionality of matching an expression defined in a named capture group, such as "(?[a-z]+)\d+(&letter)". 用于匹配小于1×10^16的中文数字,例如:“一千两百三十四万”、“八萬点七六五”、“玖仟玖佰玖拾玖万玖仟玖佰玖拾玖亿玖仟玖佰玖拾玖万玖仟玖佰玖拾玖点玖玖玖玖玖玖玖玖玖玖玖玖玖玖玖玖”,大小写中文数字可以混用,中文数字与英文数字不可以混用。 不合法的中文数字不会被匹配,例如:“两十六”、“两十六万”不会被匹配,因为中文习惯中不将“两”与“十”连用;“两千零零六”不会被匹配,因为其中有连续的零。 需要引擎支持引用已定义组的表达式,例如:"(?[a-z]+)\d+(&letter)"。
Submitted by anonymous

Community Library Entry

83

Regular Expression
Created·2014-06-26 09:59
Updated·2023-07-20 15:08
Flavor·Python

r"
^ # get the title of this movie or series (?P<title> [-\w'\"]+ # match separator to later replace into correct title (?P<separator> [ .] ) # note this *must* be lazy for the engine to work ltr not rtl (?: [-\w'\"]+\2 )*? ) # start of movie vs serie check (?: # if this is an episode, lets match the season # number one way or another. if not, the year # of the movie (?: # series. can be a lot prettier if we used perl regex... # make sure this is not just a number in the title followed by our separator. # like, iron man 3 2013 or my.fictional.24.series (?! \d+ \2 ) # now try to match the season number (?: s (?: eason \2? )? )? (?P<season> \d\d? ) # needed to validate the last token is a dot, or whatever. (?: e\d\d? (?:-e?\d\d?)? | x\d\d? )? | # this is likely a movie, match the year (?P<year> [(\]]?\d{4}[)\]]? ) ) # make sure this ends with the separator, otherwise we # might be in the middle of something like "1080p" (?=\2) | # if we get here, this is likely still a movie. # match until one of the keywords (?= BOXSET | XVID | DIVX | LIMITED | UNRATED | PROPER | DTS | AC3 | AAC | BLU[ -]?RAY | HD(?:TV|DVD) | (?:DVD|B[DR]|WEB)RIP | \d+p | [hx]\.?264 ) )
"
gimx
Open regex in editor

Description

A neat regex for finding out whether a given torrent name is a series or a movie.

Returns the full name of the series with the separator needed to make it pretty (ie, replace it with space or what you want). Also returns the season number or the year for the movie/series, depending on what was previously matched.

Submitted by Firas Dib