Regular Expressions 101

Community Patterns

Parse URL by subcomponent

0

Regular Expression
PCRE (PHP <7.3)

/
^(?<Scheme>[file|ftp|https{0,1}|ldap|telnet]+:\/{2})?(?<Subdomain>[a-zA-Z0-9-]{0,4}\.)?(?<Domain>\w+\.\w+\.?\w+\/)(?<Subdirectory>@[\w.]+\/)?(?<Path>[\w_~.-]+\/?[\w_-]+\/{0,1}?[\w_~.-]+)(?<Query>[%?&;].[\w&_~.=-]+)?(?<Fragment>#?[\/\w_-~.,=&-]+)?$
/
gm

Description

This regular expression parses URLs and captures their subcomponents.

  • ${1} ==> Scheme/Protocol
  • ${2} ==> Subdomain
  • ${3} ==> Domain name
  • ${4} ==> Subdirectory
  • ${5} ==> Path
  • ${6} ==> Query
  • ${7} ==> Fragment

It ignores port numbers. Port numbers are necessary. But, they are rarely visible in URLs. When used in a URL, it comes after the TLD separated by a colon. When using HTTP, port 80 is standard. For HTTPS, use port 443.

Submitted by Syd Salmon - 4 years ago