Regular Expressions 101

Community Patterns

C# Regex Extract/Match Nested HTML Elements/Tags

4

Regular Expression
.NET 7.0 (C#)

@"
<div id=""target""((?'nested'<div)|</div(?'-nested')|[\w\W]*?)*</div>
"
gm

Description

With this C# regex, you can easily match/Parse Nested HTML tags.

Example input:

<!DOCTYPE html>
<html>
	<head>
		<title>someTitle</title>
	</head>
	<body>
		<div class="outer">
			<h1>SomeHeader</h1>
			<div id="target">
				<h2>anotherHeader</h2>
				<div id="child">
					<div id="grand-child">
						<h4>HelloWorld</h4>
					</div>
				</div>
			</div>
		</div>
	</body>
</html>

Output:

<div id="target">
	<h2>anotherHeader</h2>
	<div id="child">
		<div id="grand-child">
			<h4>HelloWorld</h4>
		</div>
	</div>
</div>

string pattern = $@"{ startP }((?'nested'{ openP })|{ closeP }(?'-nested')|[\w\W]*?)*{ closeP }";

*'StartP' (Must include open tag), example: <div id="target" *'openP' example: <div *'closeP' example: </div

References:

In Depth with RegEx Matching Nested Constructions In Depth with .NET RegEx Balanced Grouping Regular expression matches closed HTML tags (nesting is supported)

Submitted by w4po - 2 years ago (Last modified 8 months ago)