Community Patterns

Community Library Entry

0

Regular Expression
Created·2020-02-14 21:40
Flavor·PCRE (Legacy)

/
^(?<Scheme>[file|ftp|https{0,1}|ldap|telnet]+:\/{2})?(?<Subdomain>[a-zA-Z0-9-]{0,4}\.)?(?<Domain>\w+\.\w+\.?\w+\/)(?<Subdirectory>@[\w.]+\/)?(?<Path>[\w_~.-]+\/?[\w_-]+\/{0,1}?[\w_~.-]+)(?<Query>[%?&;].[\w&_~.=-]+)?(?<Fragment>#?[\/\w_-~.,=&-]+)?$
/
gm
Open regex in editor

Description

This regular expression parses URLs and captures their subcomponents.

  • ${1} ==> Scheme/Protocol
  • ${2} ==> Subdomain
  • ${3} ==> Domain name
  • ${4} ==> Subdirectory
  • ${5} ==> Path
  • ${6} ==> Query
  • ${7} ==> Fragment

It ignores port numbers. Port numbers are necessary. But, they are rarely visible in URLs. When used in a URL, it comes after the TLD separated by a colon. When using HTTP, port 80 is standard. For HTTPS, use port 443.

Submitted by Syd Salmon