import re
regex = re.compile(r"(www\.)?\b(?!google\.com|system.net)[-a-zA-Z0-9@:%._\+~#=]{2,256}\.(com|net|info)\b([-a-zA-Z0-9@:%_\+.~#?&\/=]*)", flags=re.MULTILINE | re.IGNORECASE)
test_str = ("Regex to find URLs unless they include a list of strings (Visual Studio search)\n"
"I have this the following expression which is working well for my needs:\n\n"
"(www\\.)?[-a-zA-Z0-9@:%._\\+~#=]{2,256}\\.(com|net|info)\\b([-a-zA-Z0-9@:%_\\+.~#?&//=]*) \n"
"It does a pretty good job of finding URLs which are hardcoded in a large codebase. Obviously, it has some false positives for things like Using System.net, but some of the valid URLs don't include http(s):// unfortunately.\n\n"
"So now I want to be able to exclude certain things, including system.net (false positives) and google.com (common urls in comments).\n\n"
"How can I do this?\n\n"
"Ideally the list would just include google and system. Something like (!google|!system).\n\n"
"https://thisisaurl.com/")
matches = regex.finditer(test_str)
for match_num, match in enumerate(matches, start=1):
print(f"Match {match_num} was found at {match.start()}-{match.end()}: {match.group()}")
for group_num, group in enumerate(match.groups(), start=1):
print(f"Group {group_num} found at {match.start(group_num)}-{match.end(group_num)}: {group}")
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Python, please visit: https://docs.python.org/3/library/re.html