python - Scrapy: RSS control pub_date -


i'm doing rss spider. how do controlling last crawl date?

right thinking this:

  • put in control file last pub_date have crawled.
  • then when crawl starts, checks last pub_date against new pub_dates. if there new items, start crawling, if not, nothing.

how else resolve this?

i store data in database (including last crawl date , post dates) , take dates need database.


Comments

Popular posts from this blog

jQuery clickable div with working mailto link inside -

WPF: binding viewmodel property of type DateTime to Calendar inside ItemsControl -

java - Getting corefrences with Standard corenlp package -