Help writing flexible splits, perl -
a couple weeks ago posted question trouble having parsing irregularly-formatted data file. here's sample of data:
01-021412 15/02/2007 207,000.00 14,839.00 18 -6 2 6 6 5 16 6 4 4 3 -28 -59 -88 -119 -149 -191 -215 -246 atraso promedio ---> 2.88
i need program extract 01-021412, 18, count , sum digits in subsequent series, , store atraso promedio, , repeat operation on 40,000 entires. received helpful response, , able write code:
use strict; use warnings; #create output file open(out, ">outfull.csv"); print out "loanid,npayments,atrasopromedio,atrasoalt,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72\n"; open(myinputfile, "<datos historico aspire2.txt"); @payments; $numberofpayments; $loannumber; while(<myinputfile>) { if(/\b\d{2}-\d{6}\b/) { ($loannumber, undef, undef, undef, $numberofpayments, @payments) = split; } elsif(m/---> *(\d*.\d*)/) { (undef, undef, undef, $atrasopromedio) = split; $n = scalar @payments; print "$numberofpayments,$n,$loannumber\n"; if($n==$numberofpayments){ $total = 0; ($total+=$_) @payments; $atrasoalt = $total/$n; print out "$loannumber,$numberofpayments,$atrasopromedio,$atrasoalt,",join( ',', @payments),"\n"; } } else { push(@payments, split); } }
this work fine, except fact 50 percent of entries include '*' follows:
* 01-051948 06/03/2009 424,350.00 17,315.00 48 0 6 -2 0 21 10 9 13 10 9 7 13 3 4 12 -3 14 8 6 atraso promedio ---> 3.02
the asterisk causes program fail because interrupts split pattern, causing incorrect variable assignments. until i've dealt removing asterisks input data file, realized doing program omits these loans altogether. there economical way modify script handles entries , without asterisks?
as aside, if entry include asterisk record fact in output data.
many in advance, aaron
use intermediate array:
my $has_asterisk; # ... if(/\b\d{2}-\d{6}\b/) { @fields = split; $has_asterisk = $fields[0] eq '*'; shift @fields if $has_asterisk; ($loannumber, undef, undef, undef, $numberofpayments, @payments) = @fields; }
Comments
Post a Comment