Regular Expressions 101

Community Patterns

Get path from any text

0

Regular Expression
PCRE2 (PHP >=7.3)

/
(?############ Let's catch paths without "" or '' ############################ )(?<opening>(?# First, catch the starting path, the <opening> ################### )\b(?<montage>[a-zA-Z]:[\/\\])(?# montage = 'C:/' )|[\/\\][\/\\](?<!http:\/\/)(?<!https:\/\/)(?>(?# check not 'http[s]:' prefix )[?.][\/\\](?:[^\/\\<>:"|?\n\r ]+[\/\\])?(?# '//[?or.]/xxxxx' or '//[?or.]/server/' )(?&montage)?(?# '//[?or.]/c:/' or '//[?or.]/server/c:/' )|(?!(?&montage)))(?# '//[addressIP/ or serverName/ but not C:/]' )|%\w+%[\/\\]?(?# '%EnvVariable%[/]' ))(?# So, <opening> catch : 'C:/' or '//[?or.]/[UNC/]C:/' or '//[?or.]/[UNC/]' or '//[next characters must be something other than C:/]' or '%EnvironementVariable%[/]' )(?:(?# now, we catch each directory name wich is between [/] ######################## )[^\/\\<>:"|?\n\r ,'](?# the first character should not be [ ,'] )[^\/\\<>:"|?\n\r]*(?# Any pathFrendly character )(?<![ ,'])(?# The last directory name's character must not be [ ,'] )[\/\\](?# End of directory name - who are between '/' - ))*(?# Catch most 'directoryName/' as possible )(?:(?# Lets catch the End path. There is a file ? a directory ? or just a useless '/' ? )(?=[^\/\\<>:"'|?\n\r;, ])(?#if next character is not pathFriendly or ' ' or [,'], we have reach the end of the path => we don't catch the last '/' and the the Regex end now. You can't catch fileName who begin by [,'] because they are probably a delimiter between 2 path. but '.' is allowed )(?:(?#If we are here, that mean there is a fileName or directoryName to catch ###### We will catch the last directoryName or the fileName without the extention ###### )(?:[^\/\\<>:"|?\n\r;, .](?# catch any character pathFriendly exept ' ' or [,.] )(?: (?=[\w\-]))?(?# If we find a ' ', we catch him if next charcter is not a delimiter. I see '-' after an ' ' not like a delimiter. )(?:\*(?!= ))?(?# If we find a '*', we stop the catch if next character is an ' ' )(?!(?&montage))(?# If we find a string who look like 'C:/', we stop the catch ))+(?# We catch theses word delimited by ' ' as much as possible ))?(?# it's possible the fileName have no name, but just an extention )(?:\.\w+(?# #### an extention begin by '.' and at least one none delimiter chracter ))*(?# we can add more extention until the first none '.' delimiter character. So, after the first '.' character inside a fileName, we cannot catch any ' ' character If we don't find one extention, so the filename is a directory name, and we stop the catch. ))(?# ############# END OF PATH CATCHING WITHOUT QUOTE "" and '' ####################### )|(?:(?# ######### Catching path quoted '' ########################### Path quoted '' is difficult because ['] is also a pathFrendly character )'(?&opening)(?# We catch .* between quote only if string start with an <opening> )(?=.*'\W|.*'$)(?# We catch .* between quote only if we are sure we will find end quote. End quote must be ['] and delimiter character or ['] and end string )(?:[^\/\\<>:'"|?\n\r]+(?# We take any pathFriendly character exept quote ['] )(?:'(?=\w))?(?# we catch quote ['] if next character is not a delimiter )[\/\\]?)*(?# Path quoted must respect this patern until end quote character ['] )')(?# end quoted '' path )|(?# ######### Catching path quoted "" ########################### )"(?&opening)(?# We catch .* between quote only if string start with an <opening> )(?=.*")(?# We catch .* between quote only if we are sure we will find end quote ["] )(?:[^\/\\<>:"|?\n\r]+(?# We take any pathFriendly character )[\/\\]?(?# pathFriendly characters can be is delimited by '\' ))*(?# Path quoted must respect this patern until end quote character )"(?# end quoted path )
/
g

Description

Get path (windows style) from any type of text (error message, e-mail corps ...), quoted or not.

THIS IS COMMENTED VERSION ! to simple copy and use it, go https://regex101.com/r/zWGLMP

  • Relative path are not supported
  • The goal is to catch what "Look like" a path. See the limitations
  • UNC path and prefix path like [//./], [//?/] or [//./UNC/] are allowed
  • some url path like [file:///C:/] or [file://] are allowed
  • Catch path quoted with ["] and [']. But these quotes are include with the catch
  • Quoted path is not concerned by limitations

Limitations : (only unquoted path)

  • [dot] and [space] is allowed, but not in a row [dot+space] or [space+dot]
  • [dot] at end of file name isn't catched
  • INSIDE A NAME FILE (or last directory if it is a path to a directory) :
    • [comma] is not supported (it stop the catch)
    • after a first [dot], any [space] stop the catch
    • after a [space], catch is stoped if next character is not a [letter], [digit] or [-]
    • so, double [space] stop the catch

Compatibility

  • compatible PCRE, PCRE2
  • AutoHotkey : don't forget to escape "%" in "`%"
  • /!\ Powershell and .Net /!\ : this regex need some modification to be interpreted by powershell. You have to replace each (?&CapturGroupName) by \k<CapturGroupName>. Use this powershell code to do this replacement : $powershellRegex = @' [Put here the regex to replace (?&CapturGroupName) with \k<CapturGroupName>] '@ -replace '\(\?&(\w+)\)', '\k<$1>' This example code must return : [Put here the regex to replace \k<CapturGroupName> with \k<CapturGroupName>]
Submitted by nitrateag - a year ago (Last modified 10 months ago)