Regular Expressions 101

Save & Share

Flavor

  • PCRE2 (PHP >=7.3)
  • PCRE (PHP <7.3)
  • ECMAScript (JavaScript)
  • Python
  • Golang
  • Java 8

Function

  • Match
  • Substitution
  • List
  • Unit Tests
"
^(?P<log_time>[\dT\-\:]+)\s+(?P<log_host>[\w\-]+)(?:\.[\w\-\.]*\s+|\s+)(?P<log_type>\w+)\s+(?:(?=\()\((?P<log_module>[^\)]+)\)\s+|)(?:(?=\[)\[(?P<log_tenant>[^ ]+)\]\s+|)(?P<log_message>.+)
"
gm
^ asserts position at start of a line
Named Capture Group log_time
(?P<log_time>[\dT\-\:]+)
Match a single character present in the list below
[\dT\-\:]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\d matches a digit (equivalent to [0-9])
T matches the character T with index 8410 (5416 or 1248) literally (case sensitive)
\- matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
\: matches the character : with index 5810 (3A16 or 728) literally (case sensitive)
\s
matches any whitespace character (equivalent to [\r\n\t\f\v  ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
Named Capture Group log_host
(?P<log_host>[\w\-]+)
Match a single character present in the list below
[\w\-]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\w matches any word character (equivalent to [a-zA-Z0-9_])
\- matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
Non-capturing group
(?:\.[\w\-\.]*\s+|\s+)
1st Alternative
\.[\w\-\.]*\s+
\. matches the character . with index 4610 (2E16 or 568) literally (case sensitive)
Match a single character present in the list below
[\w\-\.]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\w matches any word character (equivalent to [a-zA-Z0-9_])
\- matches the character - with index 4510 (2D16 or 558) literally (case sensitive)
\. matches the character . with index 4610 (2E16 or 568) literally (case sensitive)
\s
matches any whitespace character (equivalent to [\r\n\t\f\v  ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Alternative
\s+
\s
matches any whitespace character (equivalent to [\r\n\t\f\v  ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
Named Capture Group log_type
(?P<log_type>\w+)
\w
matches any word character (equivalent to [a-zA-Z0-9_])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s
matches any whitespace character (equivalent to [\r\n\t\f\v  ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
Non-capturing group
(?:(?=\()\((?P<log_module>[^\)]+)\)\s+|)
1st Alternative
(?=\()\((?P<log_module>[^\)]+)\)\s+
Positive Lookahead
(?=\()
Assert that the Regex below matches
\( matches the character ( with index 4010 (2816 or 508) literally (case sensitive)
\( matches the character ( with index 4010 (2816 or 508) literally (case sensitive)
Named Capture Group log_module
(?P<log_module>[^\)]+)
Match a single character not present in the list below
[^\)]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\) matches the character ) with index 4110 (2916 or 518) literally (case sensitive)
\) matches the character ) with index 4110 (2916 or 518) literally (case sensitive)
\s
matches any whitespace character (equivalent to [\r\n\t\f\v  ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Alternative null, matches any position
Non-capturing group
(?:(?=\[)\[(?P<log_tenant>[^ ]+)\]\s+|)
1st Alternative
(?=\[)\[(?P<log_tenant>[^ ]+)\]\s+
Positive Lookahead
(?=\[)
Assert that the Regex below matches
\[ matches the character [ with index 9110 (5B16 or 1338) literally (case sensitive)
\[ matches the character [ with index 9110 (5B16 or 1338) literally (case sensitive)
Named Capture Group log_tenant
(?P<log_tenant>[^ ]+)
Match a single character not present in the list below
[^ ]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character with index 3210 (2016 or 408) literally (case sensitive)
\] matches the character ] with index 9310 (5D16 or 1358) literally (case sensitive)
\s
matches any whitespace character (equivalent to [\r\n\t\f\v  ])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Alternative null, matches any position
Named Capture Group log_message
(?P<log_message>.+)
.
matches any character (except for line terminators)
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
Your regular expression does not match the subject string.

Regular Expression
No Match

r"
"
gm

Test String