Regular Expressions 101

Community Patterns

Community Library Entry

6

Regular Expression
PCRE (PHP <7.3)

/
\A\s* (?: ######################################################################### # Option A: [<Addition to address 1>] <House number> <Street name> # # [<Addition to address 2>] # ######################################################################### (?:(?P<A_Addition_to_address_1>.*?),\s*)? # Addition to address 1 (?:No\.\s*)? (?P<A_House_number_1>\pN+[a-zA-Z]?(?:\s*[-\/\pP]\s*\pN+[a-zA-Z]?)*) # House number \s*,?\s* (?P<A_Street_name_1>(?:[a-zA-Z]\s*|\pN\pL{2,}\s\pL)\S[^,#]*?(?<!\s)) # Street name \s*(?:(?:[,\/]|(?=\#))\s*(?!\s*No\.) (?P<A_Addition_to_address_2>(?!\s).*?))? # Addition to address 2 | ######################################################################### # Option B: [<Addition to address 1>] <Street name> <House number> # # [<Addition to address 2>] # ######################################################################### (?:(?P<B_Addition_to_address_1>.*?),\s*(?=.*[,\/]))? # Addition to address 1 (?!\s*No\.)(?P<B_Street_name>\S\s*\S(?:[^,#](?!\b\pN+\s))*?(?<!\s)) # Street name \s*[\/,]?\s*(?:\sNo\.)?\s+ (?P<B_House_number>\pN+\s*-?[a-zA-Z]?(?:\s*[-\/\pP]?\s*\pN+(?:\s*[\-a-zA-Z])?)*|[IVXLCDM]+(?!.*\b\pN+\b))(?<!\s) # House number \s*(?:(?:[,\/]|(?=\#)|\s)\s*(?!\s*No\.)\s* (?P<B_Addition_to_address_2>(?!\s).*?))? # Addition to address 2 ) \s*\Z
/
x

Description

This regular expression splits an address line like for example "1117 Franklin Blvd" into the street name and house number. It also supports addresses where street name and house number are the other way around (e.g. "Mustermannstr. 1"). Furthermore, this regular expression also supports address lines where additional information is given that is neither a street name nor a house number (e.g. "3940 Radio Road, Unit 110", "Pallaswiesenstr. 57 App. 235", "Suite 1500, 802 Docklands Street"). The regular expression has 8 capture groups in total. The first 4 capture groups get used when the house number precedes the street name. The names of these capture groups are prefixed with "A_". If the house number succeeds the street name, the last 4 capture groups are used. These are prefixed with "B_".

Submitted by Andre Wisplinghoff - 10 years ago