regex - Querying a website with Perl LWP::Simple to Process Online Prices -
in free time, i've been trying improve perl abilities working on script uses lwp::simple poll 1 specific website's product pages check prices of products (i'm of perl noob). script keeps simple backlog of last price seen item (since prices change frequently).
i wondering if there way further automate script don't have explicitly add page's url initial hash (i.e. keep array of key terms , search query amazon find page or price?). there anyway way doesn't involve me copying amazon's search url , parsing in keywords? (i'm aware processing html regex bad form, used since need 1 small piece of data).
#!usr/bin/perl use strict; use warnings; use lwp::simple; %oldprice; %nameurl = ( "archer season 1" => "http://rads.stackoverflow.com/amzn/click/b00475b0g2", "code complete" => "http://rads.stackoverflow.com/amzn/click/0735619670", "intermediate perl" => "http://rads.stackoverflow.com/amzn/click/0596102062", "inglorious basterds (2-disc)" => "http://rads.stackoverflow.com/amzn/click/b002t9h2lk" ); if (-e "backlog.txt"){ open (log, "backlog.txt"); while(){ chomp; @temp = split(/:\s/); $oldprice{$temp[0]} = $temp[1]; } close(log); } print "\nchecking daily amazon prices:\n"; open(log, ">backlog.txt"); foreach $key (sort keys %nameurl){ $content = $nameurl{$key} or die; $content =~ m{\s*\$(\d+.\d+)} || die; if (exists $oldprice{$key} && $oldprice{$key} != $1){ print "$key: \$$1 (was $oldprice{$key})\n"; } else{ print "\n$key: $1\n"; } print log "$key: $1\n"; } close(log);
i made simple script demonstate amazon search automation. search url departments changed escaped search term. rest of code simple parsing html::treebuilder. structure of html in question can examined dump
method (see commented-out line).
use strict; use warnings; use lwp::simple; use uri::escape; use html::treebuilder; use try::tiny; $look_for = "archer season 1"; $contents = "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3daps&field-keywords=" . uri_escape($look_for); $html = html::treebuilder->new_from_content($contents); $item ($html->look_down(id => qr/result_\d+/)) { # $item->dump; # find out structure of html $title = try { $item->look_down(class => 'producttitle')->as_trimmed_text }; $price = try { $item->look_down(class => 'newprice')->find('span')->as_text }; print "$title\n$price\n\n"; } $html->delete;
Comments
Post a Comment