Here is a tutorial I wrote to help you auto unlock and restart the process.
This tutorial assumes a *nix working environment with one spider process.
You will need to Google and/or mod the code as appropriate for your OS/setup.
First make a file containing '.' at /full/path/to/file.txt and set to 777 permission.
Next in spider.php find:
PHP Code:
if (USE_RENICE_COMMAND == 1) {
print @exec('renice 18 '.getmypid()).$br;
}
And afterwards add:
PHP Code:
$my_loc = "/full/path/to/file.txt";
$my_file = fopen($my_loc,"w+");
fputs($my_file, getmypid());
fclose($my_file);
Then set the following script in a cron job and run it every so often.
PHP Code:
<?php
$my_loc = "/full/path/to/file.txt";
$my_pid1 = file_get_contents($my_loc);
$my_pid2 = exec("ps -p $my_pid1 | grep \$? | awk '{print \$1}'");
if ($my_pid1 != $my_pid2) {
/*
- Spider is either dead or index is completed
- Query the tempspider table or query the sites table
- Find num rows in tempspider or locked val in sites
- If num rows or locked equal zero index is completed
- Once completed there is nothing more to be done
- Otherwise the spider is dead so unlock the site
- Then restart the spidering process via cron
- You can do the code for this part ;)
*/
}
?>
Asking me why the spider dies is like asking me why there are dropped packets.
Maybe the MySQL connection hung, a server timed out somewhere, and so forth.
Something somewhere burped...