Regular Expressions 101

Community Patterns

Community Library Entry

0

Regular Expression
Python

r"
(?P<type>deb(?:-src)?) (?:\[(?P<options>.*)\] )?(?P<uri>(?P<protocol>(?:(?:mirror\+)?(?P<local>file|cdrom|copy)|(?P<remote>http|https|ftp|ssh))):(?(remote)//((?:(?!-)[a-zA-Z0-9-]{1,63}(?<!-)\.)+[a-zA-Z]{2,6}))/[a-zA-Z0-1-_\./]+) (?P<suite>[a-z/]+)(?:(?<!/) (?P<components>[a-z]+(?: [a-z]+)*))
"
u

Description

Match a single line inside of an apt sources.list file and return content in named groups.

This regex should take into account many possible variations of a sources.list entry, but by far not all of them.

What it does:

  • recognize most of the protocols supported by apt
  • require the URI to begin with a domain of a remote machine, when using those protocols, which require this
  • support for optional arguments like options and arbitrarily many components
  • case sensitivity for the possibility of the suite containing an exact path, in which case the line mustn't contain any components

That it does not:

  • support spaces inside the URI, which APT also doesn't as far , as I know
  • support uncommon characters in URI and components
  • usage of IP addresses instead of domains
  • split list of options or components into separate groups
  • validity check for options
Submitted by NK308 - 3 years ago