regex - Do I need change anything on my regular expression? -
i want transfer fixed length text xml, , use regex it. in position 36 of each line of text file 'user's initial' requires 1 alphanumeric. sometime blank. use regex pattern [a-za-z\s]{1} good, either matches 1 alphanumeric or blank. when validation using schema, schema says regex doesn't match \p{l}{1} means can letter. should on regex? or have change either text file pattern or change schema. here code example:
dim linepattern2 new regex("^(?<type_code>\d{3})(?<snm>[a-za-z0-9\s.\']{20})(?<gvn_nm>[a-za-z0-9\s.\']{12})(?<init>[\p{l} ]{1})(?<sin>\d{9})(?<rcpnt_bn>[a-za-z0-9\s.\']{15})(?<l1_nm>[a-za-z0-9\s.\']{30})(?<l2_nm>[a-za-z0-9\s.\']{30})") dim settings new xmlwritersettings() settings.indent = true using writer xmlwriter = xmlwriter.create(xmloutput, settings) writer.writestartdocument() writer.writestartelement("submission") writer.writeattributestring("xmlns", "xsi", nothing, "http://www.w3.org/2001/xmlschema-instance") writer.writeattributestring("xsi", "nonamespaceschemalocation", nothing, "c:\schema\layout-topologie.xsd") writer.writestartelement("return") writer.writestartelement("t4a") using reader new streamreader(textinput) while not reader.endofstream dim line string = reader.readline() dim match2 match = linepattern2.match(line) if match2.success writer.writestartelement("t4aslip") writer.writestartelement("rcpnt_nm") writer.writeelementstring("snm", match2.groups("snm").value) writer.writeelementstring("gvn_nm", match2.groups("gvn_nm").value) writer.writeelementstring("init", match2.groups("init").value) writer.writeendelement() writer.writeelementstring("sin", match2.groups("sin").value) writer.writeelementstring("rcpnt_bn", match2.groups("rcpnt_bn").value) end if end while end using writer.writeendelement() writer.writeendelement() writer.writeendelement() writer.writeenddocument() end using
here part of text file:
100aaserude russell alan 663345678000000000000000
the schema validation error is:
'init': value ' ' not match regular expression facet '\p{l}{1}'
thanks in advance!
i think regex want:
[\p{l} ]
\p{l}
matches any letter, not ascii letters ([a-za-z]
). includes accented ascii lettersÄ
,ñ
, plus "letters" other scripts , writing systems greek, cyrillic, arabic, chinese... letter known unicode.since text format fixed-length, assume missing initial represented space, not empty string 1 expect. used literal space, can switch
\s
if want allow tab, linefeed, or other whitespace characters.the
{1}
in regexes serves no purpose. if want make sure 1 character allowed, usually add anchors, this:^[\p{l} ]$
. isn't necessary in xml schema, regexes anchored @ both ends.
Comments
Post a Comment