use strict;
my $str = '\\w = [\\p{Ll}\\p{Lu}\\p{Lt}\\p{Lo}\\p{Lm}\\p{Nd}\\p{Nl}\\p{No}\\p{Pc}] (the \\p{Mn} is not included as in .NET regex)
ﬔąфrw𝐚𝒇𝓌𝔨𝕨𝗐𝛌𝛚ὣỷᵺᴔᴉվԍӹӡҁʫ - Ll, lowercase letters (some)
AÂĞƎƗNJΔΘΣϢЉЩѬӲԽႵᎿᏉᏯԌℬⰏR𝐖 - Lu, uppercase letters (some)
DžLjNjDzᾈᾉᾊᾋᾌᾍᾎᾏᾘᾙᾚᾛᾜᾝᾞᾟᾨᾩᾪᾫᾬᾭᾮᾯᾼῌῼ - Lt, titlecase letters (all)
ǃºऌߩהײبܢ - Lo, other letters (some) (note regex101 highlighting is weird here)
ʰʷˇˣߴߵໆᱽᵂᵒᵝᶣₐ〱ꀕꜛー - Lm, Modifier letters (some)
e҇c͢ą Mn, nonspacing mark
09١٨߁߈੮୪௨௫൫๕༥៨᧕᱕5 Nd, decimal digit number (some)
Ⅲᛯⅷ𒑣 - Nl, letter number
¼৶౼൵፫⁹⅙ - No, other number
_‿⁀⁔︳︴﹍﹎﹏_ Only a _ from \\p{Pc}, connector punctuation (.NET matches all of them)';
my $regex = qr/^\w+/ump;
if ( $str =~ /$regex/g ) {
print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
# print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
# print "Capture Group 2 is $2 ... and so on\n";
}
# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Perl, please visit: http://perldoc.perl.org/perlre.html