mysql - HTML::TableExtract: how to run the right argument [see live example] -

a question regarding parser. there chance catch separators within separate table... paser script runs allready nicely. note - want store data mysql database. great have seperators - (commas, tabs or else - tab seperated values or comma seperated values handy formats work with...

( here data out of following site: http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20 )

lfd. nr. schul- nummer schulname straße plz ort telefon fax schulart webseite 1 0401 mädchenrealschule marienburg, abenberg, der diözese eichstätt marienburg 1 91183 abenberg 09178/509210 realschulen mrs-marienburg.homepage.t-online.de 2 6581 volksschule abenberg (grundschule) güssübelstr. 2 91183 abenberg 09178/215 09178/905060 volksschulen home.t-online.de/home/vs-abenberg 6 3074 private berufsschule zur sonderpäd. förderung, förderschwerpunkt lernen, abensberg regensburger straße 60 93326 abensberg 09443/709191 09443/709193 berufsschulen zur sonderpädog. förderung www.berufsschule-abensberg.de

well need have lines divided @ least 3 columns - take first record.

name: volksschule abenberg (grundschule) street: güssübelstr. 2 postal-code , town: 91183 abenberg fax , telephone: 09178/215 09178/905060 type of school: volksschulen website: home.t-online.de/home/vs-abenberg

or even better - have divided postal-code , town 2 seperate columns!? question: possible?

by way: see first record: (here show names of school)

1 0401 mädchenrealschule marienburg, abenberg, 6 3074 private berufsschule zur sonderpäd. förderung, förderschwerpunkt lernen, abensberg

those have commas inside name; make difficult create parser creates csv-fomate?

any idea how in perl... if possible great!! many many thx hint regarding little issue - besides great , fascinating!

zero

btw - if want - can add code. no problem here.

  #!/usr/bin/perl     use strict;     use warnings;     use html::tableextract;     use lwp::simple;     use cwd;     use posix qw(strftime);     $te = html::tableextract->new;     $total_records = 0;     $suchbegriffe = "e";     $treffer = 50;     $range = 0;     $url_to_process = "http://192.68.214.70/km/asps/schulsuche.asp?q=";     $processdir = "processing";     $counter = 50;     $displaydate = "";     $percent = 0;      &workdir();     chdir $processdir;     &processurl();     print "\npress <enter> continue\n";     <>;     $displaydate = strftime('%y%m%d%h%m%s', localtime);     open outfile, ">webdata_for_$suchbegriffe\_$displaydate.txt";     &processdata();     close outfile;     print "finished processing $total_records records...\n";     print "processed data saved $env{home}/$processdir/webdata_for_$suchbegriffe\_$displaydate.txt\n";     unlink 'processing.html';     die "\n";      sub processurl() {     print "\nprocessing $url_to_process$suchbegriffe&a=$treffer&s=$range\n";     getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'tempfile.html') or die 'unable page';         while( <tempfile.html> ) {           open( fh, "$_" ) or die;           while( <fh> ) {              if( $_ =~ /^.*?(treffer <b>)(d+)( - )(d+)(</b> w+ w+ <b>)(d+).*/ ) {                 $total_records = $6;                 print "total records process $total_records\n";                 }              }              close fh;        }        unlink 'tempfile.html';     }      sub processdata() {        while ( $range <= $total_records) {           getstore("$url_to_process$suchbegriffe&a=$treffer&s=$range", 'processing.html') or die 'unable page';           $te->parse_file('processing.html');           ($table) = $te->tables;           $row ( $table->rows ) {              cleanup(@$row);              print outfile "@$row\n";           }           $| = 1;            print "processed records $range $counter";           print "\r";           $counter = $counter + 50;           $range = $range + 50;           $te = html::tableextract->new;        }     }      sub cleanup() {        ( @_ ) {           s/s+/ /g;        }     }      sub workdir() {     # use home directory process data     chdir or die "$!";     if ( ! -d $processdir ) {        mkdir ("$env{home}/$processdir", 0755) or die "cannot make directory $processdir: $!";        }     }

#!/usr/bin/perl use warnings; use strict; use lwp::simple; use html::tableextract; use text::csv;  $html= 'http://192.68.214.70/km/asps/schulsuche.asp?q=a&a=20'; $html =~ tr/\r//d;     # strip carriage returns $html =~ s/&nbsp;/ /g; # expand spaces  $te = new html::tableextract(); $te->parse($html);  @cols = qw(     rownum     number     name     phone     type     website );  @fields = qw(     rownum     number     name     street     postal     town     phone     fax     type     website );  $csv = text::csv->new({ binary => 1 });  foreach $ts ($te->table_states) {     foreach $row ($ts->rows) {          #  trim leading/trailing whitespace base fields         s/^\s+//, s/\s+$// @$row;          # load fields hash using "hash slice"         %h;         @h{@cols} = @$row;          # derive fields base fields, again using hash slice         @h{qw/name street postal town/} = split /\n+/, $h{name};         @h{qw/phone fax/} = split /\n+/, $h{phone};          #  trim leading/trailing whitespace derived fields         s/^\s+//, s/\s+$// @h{qw/name street postal town/};          $csv->combine(@h{@fields});         print $csv->string, "\n";     } }

Search This Blog

Assebmley

mysql - HTML::TableExtract: how to run the right argument [see live example] -

Comments

Post a Comment

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -