Regular Expressions 101

Community Patterns

21

Get path from any text

Created·2023-01-31 14:38
Updated·2023-07-23 20:17
Flavor·PCRE2 (PHP)
Recommended·
Get path (windows style) from any type of text (error message, e-mail corps ...), quoted or not. THIS IS THE SINGLE LINE VERSION ! If you want understand how it work or edit it, go https://regex101.com/r/7o2fyy Relative path are not supported The goal is to catch what "Look like" a path. See the limitations UNC path and prefix path like //./], [//?/] or [//./UNC/] are allowed some url path like [file:///C:/] or [file://] are allowed Catch path quoted with ["] and [']. But these quotes are include with the catch Quoted path is not concerned by limitations Limitations : (only unquoted path) [dot] and [space] is allowed, but not in a row [dot+space] or [space+dot at end of file name isn't catched INSIDE A NAME FILE (or last directory if it is a path to a directory) : [comma] is not supported (it stop the catch) after a first [dot], any [space] stop the catch after a [space], catch is stoped if next character is not a [letter], [digit] or [-] so, double [space] stop the catch Compatibility compatible PCRE, PCRE2 AutoHotkey : don't forget to escape "%" in "`%" /!\ Powershell and .Net /!\\ : this regex need some modification to be interpreted by powershell. You have to replace each (?&CapturGroupName) by \k. Use this powershell code to do this replacement : ` $powershellRegex = @' [Put here the regex to replace (?&CapturGroupName) with \k] '@ -replace '\(\?&(\w+)\)', '\k' ` This example code must return : [Put here the regex to replace \k with \k]
Submitted by nitrateag

Community Library Entry

1

Regular Expression
Created·2023-05-14 05:03
Flavor·JavaScript

/
^(?!-00:00)(?=^(?:Z|[\+\-](?:0[0-9]|1[012]):00|\+0[34569]:30|\+10:30|-03:30|-09:30|\+13:00|\+14:00|\+05:45|\+08:45|\+12:45))^((Z)|([\+\-])(\d\d):(\d\d))$
/
gm
Open regex in editor

Description

Time Zone UTC Offsets in actual use for ISO 8601 / RFC 3339 Date Times (Museum of Bad Data)

https://regex101.com/library/F21Glr

Matches only (and every) UTC offset that is in actual use and valid under ISO 8601 and RFC 3339 (Y'know, the '2008-08-08T08:08:08+05:00' looking one, this is the plus/minus sign part and what follows, or the 'Z' for UTC).

This regex will work in all versions of Javascript past and present. The Expanded version with the comments will not.

Edge cases:

  • Rejects -00:00, which is valid RFC 3339 but invalid ISO 8601.
  • Accepts 13:00, 14:00, -03:30, and other yes-those-are-valid offsets

References:

Test cases:


// Time Zone UTC Offsets in actual use for ISO 8601 / RFC 3339 (Museum of Bad Data)
// Accept: UTC indicator
Z
// Accept: Valid +xx:00
+00:00
+01:00
+02:00
+03:00
+04:00
+05:00
+06:00
+07:00
+08:00
+09:00
+10:00
+11:00
+12:00
+13:00
+14:00
// Accept: Valid -xx:00
-01:00
-02:00
-03:00
-04:00
-05:00
-06:00
-07:00
-08:00
-09:00
-10:00
-11:00
-12:00

// Accept: Valid +xx:30
+03:30
+04:30
+05:30
+06:30
+09:30
+10:30

// Accept: Valid +xx:45
+05:45
+08:45
+12:45

// Accept: Valid -xx:30
-03:30
-09:30

// Accept: Valid: 30 offsets

+03:30
+04:30
+05:30
+06:30
+09:30
+10:30
-03:30
-09:30

// Accept: Valid :45 offsets
+05:45
+08:45
+12:45

// Reject: valid RFC 3339, invalid ISO 8601

-00:00

// Reject: no such UTC offset in use
-13:00
-14:00
+00:01
+00:03
+00:99
+20:00
+0:00
// Reject: no such UTC offset in use

+01:30
+07:30
+08:30
+02:30
+11:30
+12:30
+13:30
+14:30
-01:30
-02:30
-04:30
-05:30
-06:30
-07:30
-08:30
-10:30
-11:30
-12:30
-13:30
-14:30

// Reject: Unused :45 offsets
+01:45
+02:45
+03:45
+04:45
+06:45
+07:45
+09:45
+10:45
+11:45
+13:45
+14:45
-01:45
-02:45
-03:45
-04:45
-05:45
-06:45
-07:45
-08:45
-09:45
-10:45
-11:45
-12:45
-13:45
-14:45

+01:15
+02:15
+03:15
+04:15
+05:15
+06:15
+07:15
+08:15
+09:15
+10:15
+11:15
+12:15
+13:15
+14:15
-01:15
-02:15
-03:15
-04:15
-05:15
-06:15
-07:15
-08:15
-09:15
-10:15
-11:15
-12:15
-13:15
-14:15

// Reject: Z stands alone
Z00:00
Z00
Z0
// Reject: hyphen required
0100
+0100
-0100
// Reject: colon required
+0100
// Reject: No extra characters
2001-02-03T04:05:06.007+0800
+08:00 
 +08:00

Expanded Pattern:

^(?!-00:00)(?=^(?:Z|[\+\-](?:0[0-9]|1[012]):00|\+0[34569]:30|\+10:30|-03:30|-09:30|\+13:00|\+14:00|\+05:45|\+08:45|\+12:45))^((Z)|([\+\-])(\d\d):(\d\d))$
^ # use zero-width assertions to capture all the special cases:
(?!-00:00)                                    # not -00:00
(?=^(?:
   Z                                          # Z alone works,
   |[\+\-](?:0[0-9]|1[012]):00                # and all other +/- xx:00s,
   |\+0[34569]:30|\+10:30|-03:30|-09:30       # the +/- xx:30s.
   |\+13:00|\+14:00|\+05:45|\+08:45|\+12:45   # and these special cases
)) # Now that we've forced only positive matches, let's capture the pieces:
^(               # G1: the whole offset
  (Z) |             # G2: UTC indicator or nil
  ([\+\-])          # G3: +/- direction
  (\d\d)            # G4: "hours" part of offset
  :
  (\d\d)            # G5: "minutes" part of offset
)$
Submitted by Philip Flip Kromer