Regular Expressions 101

Community Patterns

HTML Regex

1

Regular Expression
Python

r"
<(?P<tag>[a-zA-Z]+)(?P<data>.*?)(?!.+\")>(?P<body>.*)</(?P=tag)>
"
gm

Description

HTML Pattern

Recursively use this pattern to match HTML content. This pattern will match <sometag my-data="1234567890" style=""> <nestedhtml>Hello World</nestedhtml> </sometag> and return a groupdict of tag, body and attribute data. Then you can use this pattern on the body to parse nested HTML tags as well.

Submitted by GrandMoff100 - 2 years ago