import Foundation
let pattern = ##"(?<!&)(?:\b(?:nbsp|quot|divide)|#[0-9a-f]+);"##
let regex = try! NSRegularExpression(pattern: pattern, options: [.anchorsMatchLines, .caseInsensitive])
let testString = ##"""
I have a huge HTML with several special chars, in the forms or "�.
Faulty HEX: #82173333;
Some of them are wrong, because they lack the initial &.
I would like to search for such wrong spacial chars. I know that I can search all the right special chars by means of the following regex:
\&(?:[a-z]+|#x?\d+);\
But I'd need a regex useful to search the wrong ones (without the initial &). Can you help me? Thanks in advance
Edit:
As suggested, I post an example. My HTML cointains the following statement:
<![CDATA[<nolink>blablabla blablabla</nolink>]]>nbsp;
where we have 2 special HTML character:
divide;
÷
quot;
I'm interested in finding the second item, because it is wrong (laking the initial &).
So the output of the requested regex should be: quot;
"""##
let stringRange = NSRange(location: 0, length: testString.utf16.count)
let matches = regex.matches(in: testString, range: stringRange)
var result: [[String]] = []
for match in matches {
var groups: [String] = []
for rangeIndex in 1 ..< match.numberOfRanges {
let nsRange = match.range(at: rangeIndex)
guard !NSEqualRanges(nsRange, NSMakeRange(NSNotFound, 0)) else { continue }
let string = (testString as NSString).substring(with: nsRange)
groups.append(string)
}
if !groups.isEmpty {
result.append(groups)
}
}
print(result)
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for Swift 5.2, please visit: https://developer.apple.com/documentation/foundation/nsregularexpression