c# - Regex vs String.Contains -


hola. i'm failing write method test words within plain text or html document. reasonably literate regex, , newer c# (from way more java).

just 'cause,

string html = source.tolower(); string plaintext = regex.replace(html, @"<(.|\n)*?>", " "); // remove tags plaintext = regex.replace(plaintext, @"\s+", " "); // remove excess white space 

and then,

string tag = "c++"; bool foundasregex = regex.ismatch(plaintext,@"\b" + regex.escape(tag) + @"\b"); bool foundascontains = plaintext.contains(tag); 

for case "c++" should found, foundasregex true , false. google-fu weak, didn't on "what hell". ideas or pointers welcome!

edit:

i'm searching matches on skills in resumes. example, distinct value "c++".

edit:

a real excerpt given below:

"...administration- c, c++, perl, shell programming..."

the problem \b matches between word character , non-word character. given expression \bc\+\+\b, have problem. "+" non-word character. searching pattern in "xxx c++, xxx", you're not going find anything. there's no "word break" after "+" character.

if you're looking non-word characters you'll have change logic. not sure best thing be. suppose can use \w, it's not going match @ beginning or end of line, you'll need (^|\w) , (\w|$) ... ugly. , slow, although perhaps still fast enough depending on needs.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -