python - Scrapy: RSS control pub_date -
i'm doing rss spider. how do controlling last crawl date?
right thinking this:
- put in control file last pub_date have crawled.
- then when crawl starts, checks last pub_date against new pub_dates. if there new items, start crawling, if not, nothing.
how else resolve this?
i store data in database (including last crawl date , post dates) , take dates need database.
Comments
Post a Comment