c# - Regex vs String.Contains -
hola. i'm failing write method test words within plain text or html document. reasonably literate regex, , newer c# (from way more java).
just 'cause,
string html = source.tolower(); string plaintext = regex.replace(html, @"<(.|\n)*?>", " "); // remove tags plaintext = regex.replace(plaintext, @"\s+", " "); // remove excess white space
and then,
string tag = "c++"; bool foundasregex = regex.ismatch(plaintext,@"\b" + regex.escape(tag) + @"\b"); bool foundascontains = plaintext.contains(tag);
for case "c++" should found, foundasregex true , false. google-fu weak, didn't on "what hell". ideas or pointers welcome!
edit:
i'm searching matches on skills in resumes. example, distinct value "c++".
edit:
a real excerpt given below:
"...administration- c, c++, perl, shell programming..."
the problem \b
matches between word character , non-word character. given expression \bc\+\+\b
, have problem. "+" non-word character. searching pattern in "xxx c++, xxx", you're not going find anything. there's no "word break" after "+" character.
if you're looking non-word characters you'll have change logic. not sure best thing be. suppose can use \w
, it's not going match @ beginning or end of line, you'll need (^|\w)
, (\w|$)
... ugly. , slow, although perhaps still fast enough depending on needs.
Comments
Post a Comment