Regex.match in C# performance problem -


hi use regex.match in c# phase text file line line. find spend more time(about 2-4 secs) when line cant match patten. spend less time(less 1 sec) when match. can tell me how can improve performance?

this regex i'm using:

^.*?\t.*?\t(?<npk>\d+)\t(?<bol>\w+)\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t\s*(?<netvalue>[\d\.,]+)\t.*?\t.*?\t(?<item>\d{6})\t(?<salesdoc>\d+)\t(?<acgidate>[\d\.]{10})\t.*?\t.*?\t.*?\t.*?\t.*?\t(?<delivery>\d+)\t\s*(?<billquantity>\d+)\t.*?\t(?<material>[\w\-]+)\tiv$ 

performance problems show when regex can't match due catastrophic backtracking. happens when regex allows lot of possible combinations match subject text, of have tried regex engine until may declare failure.

in case, reason failure obvious:

first, you're doing shouldn't done regex, rather csv parser (or tsv parser in case).

if you're stuck regex, though, still need change something. problem delimiter \t can matched dot (.), unless entire string matches, regex engine has try oodles of permutations, outlined above.

a big step forward, therefore, change .*? [^\t]* applicable, , condense repeats using {m,n} operator:

^(?:[^\t]*\t){2}(?<npk>\d+)\t(?<bol>\w+)(?:\t[^\t]*){9}\t\s*(?<netvalue>[\d\.,]+)(?:\t[^\t]*){2}\t(?<item>\d{6})\t(?<salesdoc>\d+)\t(?<acgidate>[\d\.]{10})(?:\t[^\t]*){5}\t(?<delivery>\d+)\t\s*(?<billquantity>\d+)\t[^\t]*\t(?<material>[\w\-]+)\tiv$ 

i hope haven't miscounted :)


just illustration:

matching text

1   2   3   4   5   6   7   8   9   0 

with excerpt regex above

.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t\s*(?<netvalue>[\d\.,]+) 

takes regex engine 39 steps.

when feed text, though:

1   2   3   4   5   6   7   8   9   x 

it takes regex engine 4602 steps figure out can't match.

if use

(?:[^\t]*\t){9}\s*(?<netvalue>[\d\.,]+) 

instead, engine needs 30 steps successful match , 39 failed attempt.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -