import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
final String regex = "\\[(?:http://|https://)*(?:\\w+\\.)*(\\w+(?:\\.(?:com|org|net|edu|gov|info|biz|io|co|app|co|uk|de|jp|ca|dev|app|gg))+)]\\((?:http://|https://)(?:\\w+\\.)+\\w+(?:/\\w+)*\\)";
final String string = "Normal links don't get caught:\n"
+ "[do not catch this](https://example.com)\n"
+ "orthis.com\n\n"
+ "Neither do links with full stops in the message:\n"
+ "(messages. with. full stops)[https://example.com]\n\n"
+ "even if they forget a space\n"
+ "[whoops.nospace](https://example.com)\n\n"
+ "because we catch based on tld:\n"
+ "[catchthis.com](https://malicious.link)\n"
+ "[catchthis.org](https://malicious.link)\n"
+ "[catchthis.net](https://malicious.link)\n"
+ "[catchthis.edu](https://malicious.link)\n"
+ "[catchthis.gov](https://malicious.link)\n"
+ "[catchthis.info](https://malicious.link)\n"
+ "[catchthis.biz](https://malicious.link)\n"
+ "[catchthis.io](https://malicious.link)\n"
+ "[catchthis.co](https://malicious.link)\n"
+ "[catchthis.uk](https://malicious.link)\n"
+ "[catchthis.de](https://malicious.link)\n"
+ "[catchthis.jp](https://malicious.link)\n\n"
+ "[www.catchthis.com](https://malicious.link)\n"
+ "[https://catchthis.com](https://malicious.link)\n"
+ "[http://catchthis.com](http://malicious.link)\n\n"
+ "any combination of the above also gets matched for multiple tld urls:\n"
+ "[link.co.jp.org.net](https://malicious.link)\n\n"
+ "This is perfect because we can block any malicious link with any tld or any number of subdomains, but have a controlled list of tlds that links with a fake url begin with. Since most non-standard tlds are sketchy, we don't even need that many:\n\n"
+ "[link.com](http://any.malicious.li.nk/anything/at/all)\n\n"
+ "Any number of subdomains also get caught:\n"
+ "[auth.google.com](https://malicious.website.com)\n"
+ "[any.number.at.all.com](https://malicious.link)\n\n\n"
+ "This method of having a set tld list means almost zero false positives, with the drawback of people having to recognise sketchy urls themselves:\n\n"
+ "[linkwitha.sketchytld](https://malicious.link) // not caught\n\n"
+ "If you want a wider net with a higher chance of false positives, replace the subdomains with the word matcher wildcard (\\w+):\n\n"
+ "\\[(?:\\w+\\.)*(\\w+(?:\\.(?:\\w+))+)]\\((?:http://|https://)(?:\\w+\\.)+\\w+(?:/\\w+)*\\)\n\n"
+ "Or a much shorter one that doesn't catch http:// links but that is short enough for Discord: [discord already blocks \"fake\" links with https in the title but not ones without it]\n\n"
+ "\\[(\\w+\\.?)*]\\((https?://)(\\w+\\.?)*\\)\n\n"
+ "a longer method with subdomain denylisting is also short enough for Discord:\n\n"
+ "\\[(?:(?:www|auth|login)\\.)*(\\w+(?:\\.(?:com|org|net|edu|gov|info|biz|io|co|app|co|uk|de|jp|ca|dev|app|gg))+)]\\((?:http://|https://)(?:\\w+\\.)+\\w+(?:/\\w+)*\\)\n\n"
+ "Since this compiles to a shorter resulting regex (add more subdomains after auth to catch more. )";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Java, please visit: https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html