Community Patterns

Community Library Entry

1

Regular Expression
Created·2023-06-15 18:56
Flavor·Python

r"
<!DOCTYPE html>|</?\s*[a-z-][^>]*\s*>|(\&(?:[\w\d]+|#\d+|#x[a-f\d]+);|<!--[\s\S\n]*?-->)
"
g
Open regex in editor

Description

This would appear to violate the premise of this famous StackOverflow answer, however this is not parsing as such, only matching or heuristic identification.

Technically, all text is HTML if served to a browser in such a way that the browser chooses to interpret it that way, e.g. using a text/html Content-Type.

Submitted by Alice Bevan-mcgregor