use strict;
my $str = 'Here\'s the same regex in a convenient one-liner:
/^(?>\\p{Sc}[1-9]\\d{0,2}+(?:(?<gd>(?<gdc>,)|(?<gdp>\\.)|\\ )\\d{3}+(?:(?&gd)\\d{3}+)*+|\\d*+))(?:(?(gdc)\\.|(?(gdp),|[,.]))\\d*+)?+$/u
We want to match these:
$1
$10
$100
$1000
$1,000
$10,000
$100,000
$1,000,000
$1.00
$100.00
$10,000.00
$1,000,000.00
$1.000.000,00
$1.000
$1.000.000
$1,00
$1.000,00
$10000000000000
$100000000000.00000
$100000000000,00000
We don\'t want to match these:
$1,000,00
$1,00,00
$1.00.000
A few different currency symbols that we want to match:
$1
¢1
£1
¤1
¥1
₠1
₡1
₢1
₣1
₤1
₥1
₦1
₧1
₩1
₪1
₫1
€1
₭1
₮1
₯1
₰1
₱1
₲1
₳1
₴1
₵1
₶1
₷1
₸1
₹1
₺1
₻1
₼1
₽1
Nearby symbols that share some bytes (UTF-8) that we don\'t want to match:
¡1
¦1
§1
©1
ₔ1';
my $regex = qr/(?(DEFINE)
(?<currency_symbol> \p{Sc} )
(?<leading_group> [1-9] \d{0,2}+ )
(?<group> \d{3}+ )
(?<non_leading_groups> (?&group) (?:(?&group_delim)(?&group))*+ )
(?<decimal_delim>
(?# Use the opposite of group_delim. )
(?(group_delim_comma)
\.
| (?(group_delim_period)
,
| [,.] (?# There's no definitive grouping_delim. )
)
)
)
)
^
(?>
(?¤cy_symbol)
(?&leading_group)
(?:
(?<group_delim>
(?<group_delim_comma> , ) |
(?<group_delim_period> \. ) |
\ (?# Allow whitespace as a delimiter. )
)
(?&non_leading_groups)
| \d*+ (?# It's also possible that there's no delimiter. )
)
)
(?:
(?&decimal_delim)
\d*+
)?+
$/uxmp;
if ( $str =~ /$regex/g ) {
print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
# print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
# print "Capture Group 2 is $2 ... and so on\n";
}
# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Perl, please visit: http://perldoc.perl.org/perlre.html