PhpDig.net - View Single Post - Can I make the spider stop and start on a dime?

dewed · 11-25-2005, 11:54 AM

I have a specific 36 hour window every week I'm allowed to spider a remote 300k+ page catalog site. I had been using wget in recursive mode, but I have no good way to stop it and restart where it left off the next week. I also have my own format I'd like the data stored in the mysql table.

Can phpdig be bent to meet my needs? or am I better off writing my own using curl and a very large database of urls to crawl, driven by a bash/perl/php script frontend?

11-25-2005, 11:54 AM	#1
dewed Green Mole Join Date: Nov 2005 Posts: 1	Can I make the spider stop and start on a dime? I have a specific 36 hour window every week I'm allowed to spider a remote 300k+ page catalog site. I had been using wget in recursive mode, but I have no good way to stop it and restart where it left off the next week. I also have my own format I'd like the data stored in the mysql table. Can phpdig be bent to meet my needs? or am I better off writing my own using curl and a very large database of urls to crawl, driven by a bash/perl/php script frontend?