|
12-02-2004, 02:59 PM | #1 |
Green Mole
Join Date: Jun 2004
Posts: 22
|
Spider.php is killed at the command line
Hi - I have done a pretty thorough search of the forums and can't find anything that relates to my problem.
I have a site http://www.globalwaterintel.com runnign phpdig (1.8.3). So far it has been great - thanks for all those who had a hand in creating it and those who monitor these forums. This site has approximately 1,200 pages on it and is expanding at the rate of about 50 pages/month. I have set up a page with links to every page on the site (http://www.globalwaterintel.com/list.php) and point the spider to that page. When I try and run the spider from the command line, it runs for a bit over a minute and then the process is killed. It doesn't even get through the part where it prints the +++++++ 's. The site is on shared hosting, so I am working on the assumption that the script is being terminated for hogging too much resource (memory or cpu) although they are yet to confirm this. I am able to idex via the web interface, but it is slow and I would really like to automate the indexing via cron. If it does turn out that the script is being killed because of resource issues, is there any way that I might be able to get around it by introducing some kind of sleep() to pause indexing to free up resources? I guess the other idea is to split the page sthat are idexed into smaller chunks of say 200 pages and index them seperately? Any ideas greatly appreciated! |
12-02-2004, 07:06 PM | #2 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
I've had the same problem myself and haven't been able to get any answers either here in the forum or from my provider.
|
12-05-2004, 02:09 PM | #3 |
Green Mole
Join Date: Jun 2004
Posts: 22
|
OK - here is what the 3rd level support at my host says:
The script was being killed because it was using up too much CPU time. The maximum amount of CPU time a process can use is 20%. This script was regularly using 80-90% of the cpu cycles on this machine, which is unacceptable in a shared hosting environment. One alternative may be to run the script with a different niceness value. This can be done using: nice --adjust=19 /usr/bin/php4 -f spider.php http://www.globalwaterintel.com/list.php Adjust can be any value between 0 (normal priority) or 19 (as nice as possible). If you just place lots of sleeps in the code, then what may happen is that the program uses no CPU time, then uses a large amount for a short burst. If the process monitor happens to see it during a short burst of high activity, then it may still kill it. The thing that I don't understand is, why does the broswer version run OK. Surely it would use more resource than running it from the shell as it is having to output to HTML - which I assume is buffered. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
spider from command line | twanoo | Troubleshooting | 3 | 01-14-2005 11:04 AM |
How To call spider from command line with debth options? | jburnett | How-to Forum | 1 | 01-12-2005 02:03 PM |
Problem running spider from Command Line | joshuag200 | Troubleshooting | 17 | 09-13-2004 08:57 PM |
Command Line Spider spiders all sites | Wayne McBryde | Troubleshooting | 3 | 01-27-2004 06:15 PM |
Spider in command line : 3 errors | Yannick | Troubleshooting | 2 | 12-19-2003 04:01 AM |