regex - Matching specific unicode char in haskell regexp -
this mac/osx related problem!
i have following 3 character long haskell string:
"a\160b"
i want match , replace middle character
several approaches like
ghci> :m +text.regex ghci> subregex (mkregex "\160") "a\160b" "x" "*** exception: user error (text.regex.posix.string died: (returncode 17,"illegal byte sequence")) ghci> subregex (mkregex "\\160") "a\160b" "x" "a\160b"
did not yield desired result.
how have modify regexp or environment replace '\160' 'x' ?
the problem seems have it's root in locale/encoding of input.
bash> locale lang= lc_collate="c" lc_ctype="utf-8" lc_messages="c" lc_monetary="c" lc_numeric="c" lc_time="c" lc_all=
i modified .bashrc export following env-vars:
bash> locale lang="en_us.utf-8" lc_collate="en_us.utf-8" lc_ctype="en_us.utf-8" lc_messages="en_us.utf-8" lc_monetary="en_us.utf-8" lc_numeric="en_us.utf-8" lc_time="en_us.utf-8" lc_all="en_us.utf-8"
but did not change behavior @ all.
i able reproduce problem setting locale 'en_us.utf-8'. (i using macosx.)
bash> export lang=en_us.utf-8 bash> ghci ghci, version 6.12.1: http://www.haskell.org/ghc/ :? prelude> :m +text.regex prelude text.regex> subregex (mkregex "\160") "a\160b" "x" "*** exception: user error (text.regex.posix.string died: (returncode 17,"illegal byte sequence"))
setting locale 'c' should fix problem:
bash> export lang=c bash> ghci ghci, version 6.12.1: http://www.haskell.org/ghc/ :? prelude> :m +text.regex prelude text.regex> subregex (mkregex "\160") "a\160b" "x" "axb"
unfortunately, don't have explanation why locale causing problem.
Comments
Post a Comment