|
06-30-2004, 10:37 AM | #1 |
Green Mole
Join Date: Jun 2004
Posts: 11
|
Automatic spider
hi there,
i want automate the adding of links to the se. had anyone played too with this idea? env: i create a table, in this will stored any new links (a lot links). i call this table my linkspool. so on,.. i have a cron job every 3 minutes which check, whether a new job (link) is in the spool table. if a new job is in there, the script lock the link and spider it. after the spidering the script delete the link in the spool. finished! but i have two probs!!!! first: if a spider lasts over 3 minutes, it takes the next link from the spool and starts a new spider... thats okay... i check with the script how many spider are running, if it more than 5, the script will exit and wait to a thread is free. this isnt work really good, how can i check with php how many php spider threads are opened?????????? second: so with the cron, the spider maschine runs and runs and runs..... but if a spiderjob is locked, out any reason, it blocked the thread. how can i kill via php the spider php pid which is older than 20 minutes and how kick the link from the se db. sorry for my bad english jdc |
06-30-2004, 11:48 AM | #2 |
Purple Mole
Join Date: Dec 2003
Posts: 106
|
Hi jdc -
If you have a main script (the one that looks at the linkspool and runs spider processes), keeping track of the number of spiders is easy. Just increment a counter every time a spider is called, and when your counter variable reaches 5, you can sleep the script for a period of time and then check again. To kill the process, check out this thread: http://www.phpdig.net/showthread.php...&highlight=PID But instead of using a CRON job, you could use exec() or system() commands through PHP.
__________________
Foundmyself.com artist community, art galleries |
07-01-2004, 01:08 AM | #3 |
Green Mole
Join Date: Jun 2004
Posts: 11
|
okay thats cool,
but with the cron i can kill the spider, but the link which the spider was spidering is still locked in the db. i need a search and destroy session how can i give after kill the spider a parameter (ex. the site_id) to another script which delete all db entries for this link? thx |
07-01-2004, 01:17 AM | #4 |
Green Mole
Join Date: Jun 2004
Posts: 11
|
hmmm,.... after thinking, the cron is not really good....for my problem:
10 * * * * ps -ef | grep 'php -f spider.php' | awk '{print $2}' | xargs kill -9 i start any 3 minutes a new spider and shoult kill after 10 minutes,... so i have started more than 1 spider... this cron kill all my spider,.. thats no good. can i kill via shell all php spiders which has a running time from 10 minutes? |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
automatic closing of the windows opened | ainu | Coding & Tutorials | 1 | 05-26-2006 03:42 AM |
How to schedule automatic spidering using windows schedule task utility | joezeon | How-to Forum | 1 | 10-15-2005 09:23 AM |
Automatic Webpage Thumbnails In Search Result Page | JWSmythe | Mod Submissions | 6 | 08-24-2004 12:01 PM |
Automatic reIndexing - How to set up | 2440media | How-to Forum | 2 | 06-17-2004 06:01 AM |