regex101: Universal XML Structure Parsing Regex(PCRE2-PHP)

1

JSON Parser

Created·2026-07-25 18:00

Description This regular expression is designed to tokenize JSON-like content embedded in INI files. It is not a full JSON validator; instead, it provides a lightweight lexical scanner that identifies structural tokens (objects, arrays, strings, numbers, booleans, null) and ignores comments and whitespace. The token stream is then processed by a hand‑written recursive‑descent parser (with depth‑limit protection) to build a .NET object graph (Dictionary, object], or primitives). Regex Pattern (?//.|/\.?\/)| (?""(?:\\.)"")(?=(?:\s|//.|/\.?\/):)| (?(?true)|(?false)|(?null)|""(?(?:\\.))""|(?-?(?:0|[1-9)(?:\.0-9]+)?(?:[eE?[0-9]+)?))| (?:)| (?\[)| (?,)| (?\])| (?{)| (?})| (?+)| (?[\r\n]+)| (?.+) Named Capture Groups | Group Name | Matches | |------------------|-------------------------------------------------------------------------| | Comment | Single‑line //… or multi‑line /…/ comments (skipped). | | key | A JSON property key (double‑quoted string) followed by a colon (lookahead). | | value | A JSON value – one of: true, false, null, a double‑quoted string, or a number (integer, float, or scientific). | | bool | Sub‑group inside value for true/false (for direct parsing). | | null | Sub‑group for null. | | string | Sub‑group for the content inside double‑quotes (without the quotes). | | number | Sub‑group for numeric literals. | | value_sep | A colon : separating key and value. | | array_open | Left bracket [. | | array_sep | Comma , between array elements. | | array_close | Right bracket ]. | | object_open | Left brace {. | | object_close | Right brace }. | | whitespace | Horizontal whitespace (spaces, tabs) – not newlines. | | newline | Line‑break characters (CR, LF, CRLF). | | undefined | Any other character (should not occur in valid JSON; used as fallback). | Important Notes No recursion – the regex only tokenises; the parser handles nesting and depth limits. Escaped characters inside strings (\n, \t, \", etc.) are not unescaped by the regex – the parser calls UnEscape() when _allowEscapeChars is true. Whitespace and newlines are ignored by the parser (skipped during token iteration). The undefined group uses .+ (not .*) to avoid matching empty positions – this prevents false positives when the scanner reaches the end of the string. Purpose This regex is a solid foundation for building a custom JSON lexer, parser, or tokeniser for .NET projects. Its clear separation of structural elements, comments, and whitespace makes it easy to implement lightweight, hand‑crafted parsers that do not rely on heavy external libraries. It is particularly well‑suited for small to medium‑sized files, configuration blocks, or embedded data fragments, where performance and memory footprint matter. You can adapt the token stream to your own data model, add validation, or transform the JSON on the fly – all while keeping full control over the parsing logic.

Submitted by Pavel Bashkardin

1

R7RS Scheme

Created·2026-07-05 22:53

Flavor·PCRE2 (PHP)

Matches Scheme programs as described by https://small.r7rs.org/attachment/r7rs.pdf

Submitted by stellartux

1

Even better chemical formula pattern matcher

Created·2026-07-01 16:48

Flavor·PCRE2 (PHP)

Parses for formulas using glyph, unicode or sub/sup tags for the subscripts. TS script here (with screenshot)

Submitted by Justin Hyland

1

Advanced chemical formula parser

Created·2026-07-01 16:07

Updated·2026-07-09 21:20

Flavor·PCRE2 (PHP)

Parses advanced chemical formula out of a paragraph of text

Submitted by justin hyland

1

Element Symbol Matcher

Created·2026-06-30 07:00

Flavor·PCRE2 (PHP)

Matches for any element on the periodic table of elements

Submitted by Justin Hyland

1

SMILES syntax validation/match

Created·2026-06-30 06:47

Updated·2026-07-02 06:44

Flavor·PCRE2 (PHP)

This matches some pretty complicated SMILES structures, works well for my app.

Submitted by Justin Hyland

1

Advanced chemical formula parser

Created·2026-06-30 06:42

Flavor·PCRE2 (PHP)

Matches for a chemical structure in a form similar to CHBF5-.K+. Accounts for every scenario I could think of (hydrates, adducts, ionic bonds, positive/negative charge indicators, etc)

Submitted by Justin Hyland

1

CAS Number parser

Created·2026-06-30 06:38

Flavor·PCRE2 (PHP)

Parses valid CAS numbers

Submitted by Justin Hyland

1

Spotify playlist info ripper

Created·2026-06-30 06:08

Flavor·PCRE2 (PHP)

from the lowest common div element, copy the html, and insert as the test string, then export

Submitted by anonymous

2

Universal XML Structure Parsing Regex(PCRE2-PHP)

Created·2026-05-22 05:39

Flavor·PCRE2 (PHP)

This regular expression is designed to tokenize XML content by identifying major XML constructs through named capture groups. It detects processing instructions (PI), DTD blocks, CDATA sections, comments, self‑closing tags, opening tags, closing tags, and plain text. It is suitable for building lightweight XML lexers or preprocessing XML before deeper parsing.

Submitted by Flithor

2

Universal XML Structure Parsing Regex(C#)

Created·2026-05-20 09:51

Updated·2026-05-27 11:13

Flavor·.NET 7.0 (C#)

This regular expression is designed to tokenize XML content by identifying major XML constructs through named capture groups. It detects processing instructions (PI), DTD blocks, CDATA sections, comments, self‑closing tags, opening tags, closing tags, and plain text. It is suitable for building lightweight XML lexers or preprocessing XML before deeper parsing.

Submitted by Flithor

1

Gooning67

Created·2026-04-29 07:12

Flavor·PCRE2 (PHP)

Gute KÄSE

Submitted by Goonlord69

1

3maldrehtsichderhaseimkreis

Created·2026-04-29 06:11

Flavor·PCRE2 (PHP)

wer den löwen weckt muss auch mit ihm tanzen

Submitted by anonymous

1

Halacae Text

Created·2026-04-01 00:09

Flavor·PCRE2 (PHP)

Regular expression to match a valid line of text in Halacae, a constructed language by R74n.

Submitted by R74n

1

Halacae Word

Created·2026-04-01 00:08

Flavor·PCRE2 (PHP)

Regular expression to match valid words in Halacae, a conlang by R74n.

Submitted by R74n

1

R74n URLs

Created·2026-04-01 00:05

Flavor·PCRE2 (PHP)

Regular expression to match valid URLs on R74n websites.

Submitted by R74n

1

TA_P1

Created·2026-03-26 09:45

Flavor·PCRE2 (PHP)

Code

Submitted by anonymous

1

plfjahr42

Created·2026-03-25 07:21

Flavor·PCRE2 (PHP)

plf HIIILLFFEEE

Submitted by Lecksie Hartman

1

plfJahr4

Created·2026-03-25 07:14

Flavor·PCRE2 (PHP)

PLF hiilfe

Submitted by Lecksie Hartman

1

Murat67

Created·2026-03-24 22:53

Flavor·PCRE2 (PHP)

Isch des Tüff

Submitted by webler4000

1

wald493#2

Created·2026-03-24 20:24

Flavor·PCRE2 (PHP)

Wer den Löwen weckt muss auch mit ihm tanzen!

Submitted by Webler3000

1

Capture group finder

Created·2026-03-23 19:06

Flavor·PCRE2 (PHP)

A regex to inspect other regex and match all capture groups

Submitted by SP4CEBAR

1

translation batch name structure

Created·2026-03-16 13:15

Flavor·PCRE2 (PHP)

internal structure of a batch name

Submitted by msoutopico

2

Almost universal anime filename matcher

Created·2026-03-07 18:37

Flavor·Python

matches anime filenames. such as Group] Name [Episode[Audiometa]others.ext supports NCOP, NCED, OP, ED, SP, SPnn, nn, nn.n, nn.n_vn mp4, mkv, srt, ass but you could add more. the episode must be written within a [] bracket. finally, this regex cannot cover all the cases and obviously have some issues, but it's fine under most cases. it is also a small regex practice for me.

Submitted by NullCompute0754

2

GHAS Custom Secret Scanning Regex for Password/Secret/API Key (Excludes ${...} and process.env)

Created·2026-03-06 15:52

Flavor·PCRE2 (PHP)

This is a GitHub Advanced Security (GHAS) Secret Scanning Custom Pattern I created to detect likely hardcoded credentials while reducing common false positives in code. Goal: detect assignments for these key names: password secret apikey / api_key / api-key Pattern regex: (?i)\b(password|secret|api[-]?key)\b\s[:=]\s(?!\s\$\{)(?!\sprocess\.env\b)(?:['"])?[A-Za-z0-9!@#$%^&()+=-]{5,}(?:['"])? What it should catch (examples): password: "ahsjdfahsjfhdjsahj" secret = 'kjfskahfsdhfj' apikey: ABCDE12345!@# (unquoted) What it tries NOT to catch (common false positives): password: ${password_somename} (template/variable placeholder) secret: ${VAULT_SECRET} password: process.env.DB_PASSWORD (env var reference, not a hardcoded secret) This is intended as a practical baseline; it won’t be perfect for every language/config style. If you have suggestions to improve the detection accuracy (reduce false positives/false negatives) for GHAS custom patterns, please share.

Submitted by GearoidMaguire

1

Flatten 1 line CSS

Created·2026-03-01 16:22

Updated·2026-03-01 17:35

Flavor·PCRE2 (PHP)

Finds CSS selectors that only contain one line and, using the substitution, flattens everything to be on the same line.

Submitted by anonymous

1

IP address

Created·2026-03-01 02:22

Flavor·PCRE2 (PHP)

Matches and verifies IP address. 192.168.0.1 Matches 999.999.9.9 Didn't match

Submitted by anonymous

1

Validate an IP

Created·2026-02-25 11:06

Updated·2026-02-25 11:11

Flavor·PCRE2 (PHP)

52 character long regex to validate IP address.

Submitted by Karthik

1

number selector, with commas & decimals

Created·2026-02-12 05:04

Flavor·PCRE2 (PHP)

selects numbers, with commas and decimals, like 1,234.56

Submitted by Bicorn

1

Smart outer parentheses selector with backslash escaping

Created·2026-02-10 03:26

Updated·2026-02-12 01:11

Flavor·PCRE2 (PHP)

Grabs the outer parentheses and contents taking into account inner parentheses enclosures

Submitted by bicorn

1

nexus/sonartype composer cleanup of unfinished package versions

Created·2026-02-05 13:05

Flavor·PCRE2 (PHP)

matches composer packages with -alpha, -beta and -rc versionings

Submitted by Anna Damm

1

OmegaT Project Name (Pattern)

Created·2026-02-03 16:49

Flavor·PCRE2 (PHP)

Regex showing how to match the five parts in an OmegaT project name (named as per the internal convention at cApStAn), including labels for each captured group (each part in the name).

Submitted by msoutopico

2

Validate color with grb group

Created·2026-02-03 03:22

Updated·2026-02-04 03:36

Flavor·Golang

match: #aabbcc #abc (255,255,255) (255,255,255,255) 255,255,255 no match: #gggggg #ggg 256,256,256

Submitted by Doyoung

1

unnecessary closing HTML tags

Created·2026-01-27 19:15

Flavor·PCRE2 (PHP)

Selects the ` and ` tags from a string. You can probably do this with something easier, but this works

Submitted by Twineee

1

Door_Double

Created·2026-01-27 07:54

Updated·2026-01-28 10:27

Flavor·Python

Regex for Door_Double

Submitted by anonymous

1

ตรวจสอบพยัญชนะต้นตัวสะกดสระและวรรณยุกต์ไทย

Created·2026-01-22 01:36

Updated·2026-01-23 12:42

Flavor·JavaScript

ตรวจสอบพยัญชนะต้น (ต้องมี) ตรวจตัวสะกดสำหรับสระที่ต้องมี ตรวจสอบการวางสระและวรรณยุกต์ไทย หมายเหตุ การตรวจสอบตัวสะกดในภาษาไทยตรวจสอบได้ยากเพราะภาษาไทยเป็นภาษาที่เขียนติด ๆ กันไม่มีการแบ่งคำอย่างชัดเจนทำให้การอ่านภาษาไทยผู้อ่านต้องใช้ความหมายของคำในการตัดสินการอ่านแบ่งคำตามความเหมาะสมเช่นคำว่า "ตากลม" อาจอ่านเป็น "ตาก-ลม" ก็ได้ หรืออ่านเป็น "ตา-กลม"ก็ได้ ดังนั้นการเขียน Regex เพื่อทำการตรวจสอบอาจช่วยได้ระดับหนึ่ง อ่าจมีผิดบ้างถูกบ้าง แต่ก็ถือว่าเป็นเครื่องมือที่ใช้ช่วยเหลือในการตรวจสอบเพิ่มเติมได้ 80% ของความเป็นไปใด้ก็แล้วกันนะครับ หวังว่าการเขียนเพิ่มเติมส่วนนี้ จะมีประโยชน์บ้างไม่มากก็น้อย

Submitted by อธิปัตย์ ล้อวงศ์งาม

2

Secure Email Validation (OWASP-Aligned & DNS-Strict) (RFC 5322)

Created·2026-01-21 16:32

Updated·2026-01-21 16:36

Flavor·JavaScript

This regular expression provides a balance between RFC compliance and security-best-practices. It is designed to prevent injection vectors in legacy systems by using a restricted "safe" character subset recommended by the OWASP Validation Regex Repository. Pattern: ^A-Za-z0-9(?:\.A-Za-z0-9_+&-]+)*@(?:[A-Za-z0-9{0,61}[A-Za-z0-9])?\.)+[A-Za-z]{2,63}$ Key Features: Forced Alphanumeric Start: Prevents leading hyphens to avoid command injection vulnerabilities. Security Subset: Restricts special characters to _+&*- to prevent exotic character injections (e.g., pipes or backticks). No Quoted Strings: Forbids quoted strings to eliminate dangerous payloads containing spaces or backslashes. DNS Compliance: Enforces label lengths (1–63 characters) and prevents labels from starting or ending with hyphens. Whole String Anchoring: Uses ^ and $ to ensure the entire input is validated.

Submitted by Gor Sargsyan

1

Laravel Log (MonoLog) Print

Created·2026-01-21 14:51

Updated·2026-01-21 15:00

Flavor·PCRE2 (PHP)

Extracts the following fields from a Laravel Log print: Time Stamp (Date) Log Level (supports optional prefix (ie. local.DEBUG)) Log Message Exception | Context If an exception is detected, the file reference (file path + line number), and optional stacktrace are captured.

Submitted by Jerren

1

Poland PESEL

Created·2026-01-21 11:23

Flavor·PCRE2 (PHP)

Correct Poland PESEL base regex validator. Covers birth dates up to 2099 year. PESEL numbers are in YYMMDDXXXXX format, however, the month is adjusted based on what year people were born. People born from 1900-1999 has no adjustment, 2000-2099 is the month +20, 2100-2199 is +40, 2200-2299 is +60. For example, if someone was born December 16, 2016, their PESEL would be 163216XXXXX.

Submitted by @Sldmk

1

PLF4 Formular mit Komponente

Created·2026-01-21 07:52

Flavor·PCRE2 (PHP)

PLF4

Submitted by anonymous

1

PLF3

Created·2026-01-21 07:38

Flavor·PCRE2 (PHP)

PLF3

Submitted by anonym

1

PLF2

Created·2026-01-21 07:22

Flavor·PCRE2 (PHP)

PLF2

Submitted by anonym

1

PLF1

Created·2026-01-21 07:20

Flavor·PCRE2 (PHP)

plf1

Submitted by anonym

1

Wasser

Created·2026-01-20 20:24

Flavor·PCRE2 (PHP)

hey das ist richtig coool oder?

Submitted by anonymous

1

Python -- Match container image reference parts

Created·2026-01-20 02:14

Updated·2026-01-20 02:28

Flavor·Python

This is a Python RE for matching different parts and combinations of various docker/container image references, as compatible with docker pull etc. Parts / Capture Groups image: # whole image repo: # everything but the tag/digest registry: # host+port host: port: name: # path within registry reference: # tag or digest (with : or @; I haven't bothered to figure out how to strip that yet) tag: digest: If it matches, it should match image, repo, and name at the very least.

Submitted by neal-ian

1

Check valid URI Scheme according to RFC2396

Created·2026-01-17 07:52

Flavor·Golang

Simple prefix matcher for validating URI Schemes according to [RFC2396, Section 3.1]( http://www.faqs.org/rfcs/rfc2396.html#3.5:~:text=well%20as%20%22http%22%29.-,scheme%20%20%20%20%20%20%20%20%3D%20alpha%20*%28%20alpha%20%7C%20digit%20%7C%20%22%2B%22%20%7C%20%22%2D%22%20%7C%20%22.%22%20%29,-Relative%20URI%20references) Go Playground: https://go.dev/play/p/vtYEugsNAfo

Submitted by Gwyneth Llewelyn

1

URL, URI format validation

Created·2026-01-16 16:16

Updated·2026-01-16 23:25

Flavor·JavaScript

URL, URI format validation

Submitted by Atipat Lorwongam

2

Regex Email Format

Created·2026-01-15 17:39

Updated·2026-01-16 23:26

Flavor·JavaScript

Used to check Email Format

Submitted by Atipat Lorwongam

1

ตรวจสอบวรรณยุกต์และสระในภาษาไทย (รองรับการลากเสียงอาาาา)

Created·2026-01-15 17:20

Updated·2026-01-23 07:30

Flavor·JavaScript

ตรวจสอบการใช้วรรณยุกต์และสระในภาษาไทย รองรับการใช้สระอาาาาาติดกันแบบลากเสียง ไม่ตรวจสอบพยัญชนะต้นและตัวสะกด

Submitted by อธิปัตย์ ล้อวงศ์งาม

1

ตรวจสอบวรรณยุกต์และสระในภาษาไทย

Created·2026-01-15 17:19

Updated·2026-01-23 07:29

Flavor·JavaScript

ใช้ตรวจสอบวรรณยุกต์และสระในภาษาไทย ไม่ตรวจสอบพยัญชนะต้นและตัวสะกด

Submitted by อธิปัตย์ ล้อวงศ์งาม

Order By

Filter by Flavor

Community Patterns

JSON Parser

R7RS Scheme

Even better chemical formula pattern matcher

Advanced chemical formula parser

Element Symbol Matcher

SMILES syntax validation/match

Advanced chemical formula parser

CAS Number parser

Spotify playlist info ripper

Universal XML Structure Parsing Regex(PCRE2-PHP)

Universal XML Structure Parsing Regex(C#)

Gooning67

3maldrehtsichderhaseimkreis

Halacae Text

Halacae Word

R74n URLs

TA_P1

plfjahr42

plfJahr4

Murat67

wald493#2

Capture group finder

translation batch name structure

Almost universal anime filename matcher

GHAS Custom Secret Scanning Regex for Password/Secret/API Key (Excludes ${...} and process.env)

Flatten 1 line CSS

IP address

Validate an IP

number selector, with commas & decimals

Smart outer parentheses selector with backslash escaping

nexus/sonartype composer cleanup of unfinished package versions

OmegaT Project Name (Pattern)

Validate color with grb group

unnecessary closing HTML tags

Door_Double

ตรวจสอบพยัญชนะต้นตัวสะกดสระและวรรณยุกต์ไทย

Secure Email Validation (OWASP-Aligned & DNS-Strict) (RFC 5322)

Laravel Log (MonoLog) Print

Poland PESEL

PLF4 Formular mit Komponente

PLF3

PLF2

PLF1

Wasser

Python -- Match container image reference parts

Check valid URI Scheme according to RFC2396

URL, URI format validation

Regex Email Format

ตรวจสอบวรรณยุกต์และสระในภาษาไทย (รองรับการลากเสียงอาาาา)

ตรวจสอบวรรณยุกต์และสระในภาษาไทย