Regular Expressions 101

Community Patterns

Parse URLs

1

Regular Expression
Python

r'''
(?#First, match the protocol) (?:https?|ftp):// (?#Next, check for optional username and/or password) (?#Note: The following two char classes are functionally equivalent) (?:[\x21-\x39\x3b-\x3f\x41-\x7e]+(?::[!-9;-?A-~]+)?@)? (?#Next, let's match the domain [with support for Punycode ]) (?:xn--[0-9a-z]+|[0-9A-Za-z_-]+\.)*(?:xn--[0-9a-z]+|[0-9A-Za-z-]+)\.(?:xn--[0-9a-z]+|[0-9A-Za-z]{2,10}) (?#Let's match on optional port) (?::(?:6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{1,3}|\d))? (?#Next, let's match on the path) (?:/[\x21\x22\x24\x25\x27-x2e\x30-\x3b\x3e\x40-\x5b\x5d-\x7e]*)* (?#Next, let's match on an anchor) (?:\#[\x21\x22\x24\x25\x27-x2e\x30-\x3b\x3e\x40-\x5b\x5d-\x7e]*)? (?#Last, but not least, we match on URI params) (?:\?[\x21\x22\x24\x25\x27-\x2e\x30-\x3b\x40-\x5b\x5d-\x7e]+=[\x21\x22\x24\x25\x27-\x2e\x30-\x3b\x40-\x5b\x5d-\x7e]*)? (?#Additional params) (?:&[\x21\x22\x24\x25\x27-\x2e\x30-\x3b\x40-\x5b\x5d-\x7e]+=[\x21\x22\x24\x25\x27-\x2e\x30-\x3b\x40-\x5b\x5d-\x7e]*)*
'''
gmxi

Description

A regular expression to parse URLs. Complete with comments.

Submitted by D. Torres - 2 years ago