Capture Paragraph Text in html


Regular Expression

(?#This tag is the start of the paragraph)<p>(?# This lookahead makes sure paragraph isn't an empty line. Needs it.)(?![\n|<])(?#This group captures the paragraph.)(.{0,500}?)(?#This negative lookahead/custom character class prevents <tags>)(?![^<][\s]|[\w]+[^>])(?#This tag is the end of the paragraph)</p>(?# This lookahead makes sure the paragraph ends at the end of the line.)(?=\n|<div|<p>)


TLDR: It can be used to capture all the paragraphs from some webnovel sites. *update: testing other sites, I've found it lacking.

This is intended to match a <p> tag and it's closing tag. It only matches and captures if there are no other html tags nested in the <p> tag.

There is room to adjust with the quantifier.

Condensed regex:

Submitted by anonymous - 7 months ago (Last modified 7 months ago)