Regular Expressions 101

Community Patterns

URL Parts Extractor

1

Regular Expression
ECMAScript (JavaScript)

/
^((?:(?:http|ftp|ws)s?|sftp):\/\/?)?([^:/\s.#?]+\.[^:/\s#?]+|localhost)(:\d+)?((?:\/\w+)*\/)?([\w\-.]+[^#?\s]+)?([^#]+)?(#[\w-]*)?$
/
gm

Description

Extracts parts of an URL into regex groups:

  • ((?:(?:http|ftp|ws)s?|sftp):\/\/?)?(group 1): extracts the protocol
  • ([^:/\s.#?]+\.[^:/\s#?]+|localhost)(group 2): extracts the hostname
  • (:\d+)?(group 3): extracts the port number
  • ((?:\/\w+)*\/)?([\w\-.]+[^#?\s]+)?(groups 4 & 5): extracts the path part
  • ([^#]+)?(group 6): extracts the query part
  • (#[\w-]*)?(group 7): extracts the hash part

For every part of the regex listed above, you can remove the ending ? to force it (or add one to make it facultative). You can also remove the ^ at the beginning and $ at the end of the regex so it won't need to match the whole string.

See on stackoverflow.com.

Note: this regex is not 100% safe and may accept some strings which are not necessarily valid URLs but it does indeed validate some criterias. Its main goal was to extract the different parts of an URL not to validate it.

Submitted by Elie Grenon (elie-g) - a year ago