|
08-10-2004, 12:28 PM | #1 |
Green Mole
Join Date: Aug 2004
Posts: 4
|
speciffically slow spidering at fgets()
I've read the other posts re: slow spidering behavior and found nothing matching my situation. Please help!
After inserting traces and such into the code, I've found a consistent delay of 10 - 15 seconds for each page being indexed which occurs across a specific function call: Code:
FILE: robot_functions.php FUNCTION: phpdigGetUrl() STATEMENT: $answer = fgets($fp,8192); Code:
OS: Win 2000 HTTPD: Apache 2.0.49 (Win32) PHP: 5.0.0 MYSQL: 4.1.3b-beta PHPDIG: 1.8.3 Thanks much! |
08-10-2004, 12:32 PM | #2 |
Green Mole
Join Date: Aug 2004
Posts: 4
|
PS - One more helpful(?) bit of info: while PhpDig spidering is going on, I've watched my CPU activity which is mostly nothing, with occasional spikes (every 10 - 15 seconds, BTW). To me, this points to a timeout issue - but I don't know where / what layer to consider. (Also, I've reduced all PhpDig sleeps to 1 or 2 seconds and this is NOT the problem at all). Thanks again!
|
08-10-2004, 06:37 PM | #4 |
Green Mole
Join Date: Aug 2004
Posts: 4
|
Vinyl J -
Good idea (and it made me solve some incidental installation problems), yet no go (i.e. same problem and with harder-to-read output <lol>). Anyway, as I mentioned above, the wget mirroring program doesn't have any trouble like this - it's quite zippy! That points away from the httpd software / configuration. It has all the smell of a communication timeout issue, but how do I investigate beyond the sticking fgets() ? |
08-15-2004, 03:05 PM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Can't say I've experienced fgets problems. Perhaps something here might help?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
08-16-2004, 01:17 PM | #6 |
Green Mole
Join Date: Aug 2004
Posts: 4
|
Well, I've exactly found the problem: the code doesn't respect the Content-Length header (or when chunked, the chunk sizes). Thus, it will always attempt an over-read. I suppose on some configurations that doesn't make a difference, but on mine it surely does! I've fully solved the problem in the test script and partially moved that solution into my own PhpDig code. If anyone cares to know more, get in touch...
Cheers! |
08-17-2004, 03:02 PM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Will you post your mod in the Mod Submissions forum?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
08-18-2004, 02:24 AM | #8 |
Green Mole
Join Date: Jul 2004
Posts: 8
|
just to throw in my two cents worth...
i'm already communicating with slintz, but this isn't a problem specific only to him...the exact same thing happens to me when i try and spider my site...i always get between 10-15 seconds (sometimes up to 20) of delay / page here is my server info: OS: Solaris 5.8 PHP: 4.3.8 Apache: 2.0.50 MySQL: 4.0.13 PhpDig: 1.8.3 yes, i realize that some of those are older versions, but i have no control over that...i just write the webpages |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fix for slow spidering in PhpDig 1.8.x | vital | Bug Tracker | 3 | 11-06-2004 10:33 AM |
Indexing slow.... no, _really_ slow | bluntman | Troubleshooting | 1 | 09-24-2004 01:23 PM |
Fix timeouts at fgets() | jinkas | Mod Requests | 0 | 08-25-2004 02:02 PM |
Spidering **VERY** Slow | Niall Fernie | Troubleshooting | 4 | 07-13-2004 12:45 AM |
Very Slow Indexing | airplay | Troubleshooting | 2 | 03-09-2004 02:20 PM |