python - Scrapy: RSS control pub_date -


i'm doing rss spider. how do controlling last crawl date?

right thinking this:

  • put in control file last pub_date have crawled.
  • then when crawl starts, checks last pub_date against new pub_dates. if there new items, start crawling, if not, nothing.

how else resolve this?

i store data in database (including last crawl date , post dates) , take dates need database.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -