Regular Expressions 101

Community Patterns

RFC-1034- and RFC-2181-compliant FQDN extractor

1

Regular Expression
PCRE (PHP <7.3)

/
\|\K(?=.{1,253}\|)(?!.*--.*)(?P<fqdn>(?:(?!-)(?![0-9])[a-zA-Z0-9-]{1,63}(?<!-)\.){1,}(?:(?!-)[a-zA-Z0-9-]{1,63}(?<!-)))\|
/
gm

Description

The specification for this regex is based upon the extracts from RFC 1034 and RFC 2181 below.

It is also assumed that you are trying to extract the FQDN from a pipe-delimited string.

If this is not the case then you will need to adjust the regex as necessary by either removing the pipe character or replacing it with a different delimiter e.g. comma.

RFC 1034

3.5. Preferred name syntax

The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

RFC 2181

  1. Name Syntax

The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name. The length of any one label is limited to between 1 and 63 octets. A full domain name is limited to 255 octets (including the separators).

Summary:

  • Valid characters [a-zA-Z0-9-]

  • FQDN length = 1-253

  • FQDN must not end with a dot.

  • Label length = 1-63

  • Labels must start with a letter.

  • Labels must not end with a hyphen.

  • Labels must not contain a double hyphen.

Submitted by 0jag - 7 years ago