Regular Expressions 101

Save & Manage Regex

Current Version: 3
Save & Share
Community Library

Flavor

PCRE2 (PHP)
ECMAScript (JavaScript)
Python
Golang
Java
.NET 7.0 (C#)
Rust
PCRE (Legacy)
Regex Flavor Guide

Function

Match
Substitution
List
Unit Tests

Tools

Regular Expression
Processing...

Test String

Code Generator

Language

Generated Code

use strict;

my $str = 'The following regex handles any (* *) style string. Let\'s call it p{1}

    p{1} = \\(+\\*+(?:[^*(]|(?:\\*+[^)*])|(?:\\(+[^*(]))*\\*+\\)+

The targeted string can include the characters {*, (, )}, but not the character sequences "(*" or "*)" (on which the regex terminates).

Examples:

  (* simple one *)
  (* more * ) ( * () **( difficult * ))()*(( one *)
  (* r *( e*a) () * *  l ()*  * ( ) )   l   * ( ) ) * **( y bad *)

Explanation:

    \\(+\\*+           # begin with a (* sequence (or some variation like ((* or ((**, etc.)
      (?:            # begin comment content
        [^*(]        # allow any non-* non-( characters (which begin open/close brackets)
        |            #   also
        (?:\\*+[^)*]) # allow * 1+ times in a row ONLY if it\'s not immediately followed by ) or *
        |            #  also
        (?:\\(+[^*(]) # allow ( 1+ times in a row ONLY if it\'s not immediately followed by * or (
      )*             # allow any number of these characters / character sequences
    \\*+\\)+           # then close the comment with a *) (or some variation like **) or *))), etc.)

To capture un-nested and nested comments, simply allow comments inside of comments. ie:

    p{2} = \\(+\\*+(?:(?:p{1})|[^*(]|(?:\\*+[^)*])|(?:\\(+[^*(]))*\\*+\\)+

in other words

    p{2} = \\(+\\*+(?:(?:\\(+\\*+(?:[^*(]|(?:\\*+[^)*])|(?:\\(+[^*(]))*\\*+\\)+)|[^*(]|(?:\\*+[^)*])|(?:\\(+[^*(]))*\\*+\\)+

You\'ll see that this works on the following examples:
  (* test (* one *) *)
  (* a bit * ( ) *  harder (*  ) * () (( *) *)
  (* r *( (( e **( a ) ((* l *() l *) y * bad * ( *)

To capture up to triply-nested comments, follow the pattern set by p{2}:

    p{3} = \\(+\\*+(?:(?:p{2})|[^*(]|(?:\\*+[^)*])|(?:\\(+[^*(]))*\\*+\\)+

    p{3} = \\(+\\*+(?:(?:\\(+\\*+(?:(?:\\(+\\*+(?:[^*(]|(?:\\*+[^)*])|(?:\\(+[^*(]))*\\*+\\)+)|[^*(]|(?:\\*+[^)*])|(?:\\(+[^*(]))*\\*+\\)+)|[^*(]|(?:\\*+[^)*])|(?:\\(+[^*(]))*\\*+\\)+

Examples:
  (* an (* easy (* one *) *) *)
  (* only (* some (* levels (* captured *) because *) 4x *) nested *)

The pattern can be followed to allow any depth of nested comments to be captured, by defining

    p{N} = \\(+\\*+(?:(?:p{N-1})|[^*(]|(?:\\*+[^)*])|(?:\\(+[^*(]))*\\*+\\)+

for N > 1

';
my $regex = qr/\(+\*+(?:(?:\(+\*+(?:(?:\(+\*+(?:[^*(]|(?:\*+[^)*])|(?:\(+[^*(]))*\*+\)+)|[^*(]|(?:\*+[^)*])|(?:\(+[^*(]))*\*+\)+)|[^*(]|(?:\*+[^)*])|(?:\(+[^*(]))*\*+\)+/p;

if ( $str =~ /$regex/g ) {
  print "Whole match is ${^MATCH} and its start/end positions can be obtained via \$-[0] and \$+[0]\n";
  # print "Capture Group 1 is $1 and its start/end positions can be obtained via \$-[1] and \$+[1]\n";
  # print "Capture Group 2 is $2 ... and so on\n";
}

# ${^POSTMATCH} and ${^PREMATCH} are also available with the use of '/p'
# Named capture groups can be called via $+{name}

Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Perl, please visit: http://perldoc.perl.org/perlre.html

Regular Expressions 101

Save & Manage Regex

Flavor

Function

Tools

Explanation

Match Information

Quick Reference

Regular Expression
Processing...

Test String

Code Generator

Language

Generated Code

Save & Manage Regex

Flavor

Function

Tools

Explanation

Match Information

Quick Reference

Regular ExpressionProcessing...

Test String

Regular Expression
Processing...