using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = @"(?<!&)(?:\b(?:[a-z]{2,8}\d{0,2})|#[0-9a-f]+);";
string input = @"I have a huge HTML with several special chars, in the forms or ""�.
Faulty HEX: #82173333;
Some of them are wrong, because they lack the initial &.
I would like to search for such wrong spacial chars. I know that I can search all the right special chars by means of the following regex:
\&(?:[a-z]+|#x?\d+);\
But I'd need a regex useful to search the wrong ones (without the initial &). Can you help me? Thanks in advance
Edit:
As suggested, I post an example. My HTML cointains the following statement:
<![CDATA[<nolink>blablabla blablabla</nolink>]]>nbsp;
where we have 2 special HTML character:
divide;
÷
quot;
I'm interested in finding the second item, because it is wrong (laking the initial &).
So the output of the requested regex should be: quot;";
RegexOptions options = RegexOptions.Multiline | RegexOptions.IgnoreCase;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Please keep in mind that these code samples are automatically generated and are not guaranteed to work. If you find any syntax errors, feel free to submit a bug report. For a full regex reference for C#, please visit: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex(v=vs.110).aspx