Regular Expressions 101

Community Patterns

Community Library Entry

1

Regular Expression
Python

r'
^(.+?)(?:\s*-\s*)([^.(]+)\b(?:\s*\((.*?)\))*?(?:\s*\[(.*?)\])*?\s*(\..*)
'
gm

Description

This Python regex allows you to split an audio filename with the sintax Authors - Title (Text1) [Text2].extension into:

  • Group 1: Authors
  • Group 2: Title
  • Group 3: Text1
  • Group 4: Text2
  • Group 5: .extension It is oriented to use .fullmatch() since the objective is to guarantee the capture of lastest groups. Moreover, it handles copies of the files thanks to the latest non-capturing group (?:\s*-*[\w*\s*]*)?. See the last example.

I.e., for the following filenames:

  • Author - Title.extension
    • G1: Author
    • G2: Title
    • G3&4: None
    • G5: .extension
  • Author1&Author2-This is a title(feat. Someone)[NoPremiere].mp3
    • G1: Author1&Author2
    • G2: This is a title
    • G3: feat. Someone
    • G4: NoPremiere
    • G5: .mp3
  • Author1 & Author2 - Title with & (This) (Is) [Too] [Spaced] .m4u
    • G1: Author1 & Author2
    • G2: Title with &
    • G3: Is
    • G4: Spaced
    • G5: .m4u
  • name1 - title1 ( album1 ) - copy.mp3
    • G1: name1
    • G2: title1
    • G3: album1
    • G4: None
    • G5: .extension

    Note: this regex will only get the latest string in the parentheses and in the square brackets and won't match more of them. In order to obtain all inside them, re.findall() should be implemented. Also, non-word and non-whitespace characters are not supported after the last hyphen.

Submitted by Eche L.A. - 5 years ago