Regular Expressions 101

Community Patterns

Nth occurrence of word - pull single table out of HTML with a bunch of tables

0

Regular Expression
PCRE (PHP <7.3)

/
^(?:(?:(?!table).)*table){19}(.+?(?=table))
/
gsi

Description

3rd party generated HTML (Purchase Order) was rife with errors, so direct import was impossible. Decided to pull the particular table out of the HTML attachment, and then wrap that as "new" HTML for import into DOMDocument.

PHP code: preg_match('/^(?:(?:(?!table).)*table){19}(.+?(?=table))/is', $rawHTML, $matches); $procHTML = '<html><head><title>Foo</title></head><body><table' . $matches[1] . 'table></body></html>'; // Fix the few errors in the HTML $doc = new DOMDocument (); $doc->loadHTML($procHTML);

Submitted by Christopher Cilley - 8 years ago