Regular Expressions 101

Community Patterns

discard rubbish, GEO, NUMBERS, NAME, TYPE

0

Regular Expression
Python

r"
(?ix)(?:^[A-Z\W]*\W+(?!bnd|cnr)\W+)?(?:(?!\b(?:RD|HWY|TRAIL|St|R)\b)(?P<numbers>\d+-\d+|\d+[A-Z]?)\W+)*?(?:\W+(?:AND|&)\W+)*(?:(?:(?P<geo>BND|CNR)(?:\WOF|\WBY)?)\W+)*(?P<name>(?:(?!\b(?:RD|HWY|TRAIL|St|R)\b)[A-Z]+\W*)+)\W+(?:(?P<type>RD|HWY|TRAIL|St|R)\W)+
"
g

Description

This is useful for parsing Australian street addresses.

It discards initial rubbish, then extracts:

  1. BND or CNR, which is useful for geolocating by boundaries
  2. Numbers. If you use the regex Python library, you could get a list of numbers preceding a street name
  3. Name, e.g. Bourke
  4. Type, e.g. Road Also discards trailing rubbish

Detailed explanation as verbose regex below (?ix) # case insensitive and verbose flag

(?:^[A-Z\W]*\W+(?!bnd|cnr)\W+)?         # discard initial rubbish (names not including BND/CNR) if present
(?:(?!\b(?:{0})\b)                      # not starting with road type
(?P<numbers>\d+-\d+|\d+[A-Z]?)\W+)*?    # capture numbers if present, including extension letter
(?:\W+(?:AND|&)\W+)*                    # do not capture AND/& if present
(?:(?:(?P<geo>BND|CNR)                  # capture BND/CNR if present (GEO var)
(?:\WOF|\WBY)?)\W+)*                    # do not capture OF/BY following BND/CNR
(?P<name>                               # capture street name (NAME var)
(?:(?!\b(?:{0})\b)                      # not starting with road type
[A-Z]+\W*)+)\W+                         # contains letters and non-words only
(?:(?P<type>{0}\W))+                    # capture street type (TYPE var) and ignores trailing rubbish
(?:\W|$)                                # non-word or end of string
Submitted by Daniel Vianna - 7 years ago