import re
regex = re.compile(r"(?:(?!<li>).)*<li>.*?<a href=\"([^\"]+)\".*?alt=\"On Contact: ([^\"]+)\".*?<p[^>]*>((?:(?!</p>).)+?)</p>.*?<p[^>]*>([^<]+)<(?:(?!<li>).)*", flags=re.MULTILINE | re.DOTALL)
test_str = (" <li><p style=\"margin-bottom: 0in\"><a href=\"https://www.rt.com/shows/on-contact/550756-america-long-war-race/\">On\n"
" Contact: Race and America's long war </a>\n"
" </p>\n"
" <p style=\"margin-bottom: 0in\"><a href=\"https://www.rt.com/shows/on-contact/550756-america-long-war-race/\">\n"
" <font color=\"#000080\">\n"
" <img src=\"rt.com-on_contact-220405-no_blurb_html_1dff87941f1c724a.jpg\" name=\"Image1\" alt=\"On Contact: Race and America's long war\" align=\"bottom\" width=\"280\" height=\"157\" border=\"1\"/>\n"
" </font>\n"
"</a>\n"
"</p>\n"
" <p style=\"margin-bottom: 0in\">On the show, Chris Hedges discusses\n"
" America's inner and outer wars and its nexus with capitalism and\n"
" empire with Professor of Social and Cultural Analysis and History at\n"
" New York University Nikhil Pal Singh. The internal violence in the\n"
" United... \n"
" </p>\n"
" <p style=\"margin-bottom: 0in\">Feb 27, 2022 10:36</p>\n"
" <li><p style=\"margin-bottom: 0in\"><a href=\"https://www.rt.com/shows/on-contact/550319-george-washington-genocidal-colonist/\">\n"
" <font color=\"#000080\">\n"
" <img src=\"rt.com-on_contact-220405-no_blurb_html_198feb67032166ff.png\" name=\"Image3\" alt=\"On Contact: George Washington and the legacy of white supremacy\" align=\"bottom\" width=\"280\" height=\"157\" border=\"1\"/>\n"
" </font>\n"
"</a>\n"
"</p>\n"
" <p style=\"margin-bottom: 0in\"><strong><a href=\"https://www.rt.com/shows/on-contact/550319-george-washington-genocidal-colonist/\">On\n"
" Contact: George Washington and the legacy of white supremacy </a></strong>\n"
" </p>\n"
" <p style=\"margin-bottom: 0in\">On the show, Chris Hedges discusses\n"
" George Washington, the fallible human being and one of the principal\n"
" architects of the United States, with author Nathaniel Philbrick. As\n"
" America fractures into ideologically hostile camps, it colors how\n"
" we... \n"
" </p>\n"
" <p style=\"margin-bottom: 0in\">Feb 25, 2022 09:09 \n"
" </p>\n"
" <li>[...]")
subst = "$4\\t$2\\t$1\\t$3\\n"
result = regex.sub(subst, test_str)
if result:
print(result)
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Python, please visit: https://docs.python.org/3/library/re.html