This would appear to violate the premise of this famous StackOverflow answer, however this is not parsing as such, only matching or heuristic identification.
Technically, all text is HTML if served to a browser in such a way that the browser chooses to interpret it that way, e.g. using a text/html
Content-Type.