import re
regex = re.compile(r"\b(https?://[^\s/]+/documents/)en/([^\s/]+)\.pdf\b")
test_str = ("The problem is, if the page contains a PDF linked to another page (not my own website), example https://othersite.example.com/whatever.pdf, it becomes https://othersite.example.com/whatever_SPANISH.pdf which isn't valid on other people's sites. I want to ignore offsite links and only change URLs on my site.\n\n"
"So what I would like to do is look for the string: https://example.com/documents/en/whateverfilename.pdf and pull that file name out and change it to https://example.com/documents/es/whateverfilename_SPANISH.pdf (Switching the en to es and also appending the _SPANISH to the end of the PDF filename.\n\n")
subst = "$1es$2_SPANISH.pdf"
result = regex.sub(subst, test_str)
if result:
print(result)
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Python, please visit: https://docs.python.org/3/library/re.html