Regular Expressions 101

Save & Share

Flavor

  • PCRE2 (PHP >=7.3)
  • PCRE (PHP <7.3)
  • ECMAScript (JavaScript)
  • Python
  • Golang
  • Java 8

Function

  • Match
  • Substitution
  • List
  • Unit Tests (1)
"
^#([\d]+[-]+[\d]+)\s+([a-zA-Z0-9\s*]+)$
"
gm
^ asserts position at start of a line
# matches the character # with index 3510 (2316 or 438) literally (case sensitive)
1st Capturing Group
([\d]+[-]+[\d]+)
Match a single character present in the list below
[\d]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\d matches a digit (equivalent to [0-9])
Match a single character present in the list below
[-]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
- matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
Match a single character present in the list below
[\d]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\d matches a digit (equivalent to [0-9])
\s
matches any whitespace character (equivalent to [\r\n\t\f\v  ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group
([a-zA-Z0-9\s*]+)
Match a single character present in the list below
[a-zA-Z0-9\s*]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
a-z matches a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
* matches the character * with index 4210 (2A16 or 528) literally (case sensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

Regular Expression

r"
"
gm

Unit Tests

"id": "P01", "description": "starting with # and then digits and hyphen and then digits and one or more space", "regex": "^#([\\d]+[-]+[\\d]+)\\s+([a-zA-Z0-9\\s*]+)$", "inputField": "preProcessedMerchantName", "outputField": "cleanMerchantName", "isCaseSensitiveMatch": "false", "trimOutput": "true", "transformation": "T5"