|
01-22-2005, 01:00 PM | #1 |
Green Mole
Join Date: Jan 2005
Posts: 3
|
Yet Another indexing question
I have one server. RH 9.0 runs the Apache, MySQL, and 5 virtual web sites. I am able to index 4 of the sites successfully. The last site, will only index 3-4 pages then quits with no error or completion messages. I suspect the failure is caused by HTML page content. It might be an HTML coding error or obsolete style etc..
My question is: Are there any known coding styles/tags, comments etc. in HTML that will cause the spider to terminate abnormally? My failing (spider) pages display and behave correctly with MSIE, Netscape 7.1, and Firefox 1.0. |
01-22-2005, 01:21 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Given that your 'last site' works across browsers, I doubt it's an HTML issue. Without knowing more about this last site, all I can suggest is to select the 'no' radio button, set 'search depth' to a large value, set 'links per' to zero, and give it a whirl. Depending on this last site, you might try setting LIMIT_TO_DIRECTORY to false and PHPDIG_IN_DOMAIN to true, both in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-22-2005, 01:42 PM | #3 |
Green Mole
Join Date: Jan 2005
Posts: 3
|
Thanks for the feedback
Thanks for the interest in the question.. I have read a number of the other posts looking for clues to the problem. I have tried all optons you mention including making changes to the config.php.
Does line length in the HTML files have any affect on the spider? Like a buffer overflow perahps? |
01-22-2005, 02:41 PM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
How many MB is the max-sized page? What's the link to the site?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-23-2005, 08:16 AM | #5 |
Green Mole
Join Date: Jan 2005
Posts: 3
|
Web address
None of the pages are particularly large. None over 50Kb. Below shows the result of the indexing process. This happens every time.
Spidering in progress... [Stop spider] SITE : http://tulare.homelinux.net/ Exclude paths : - @NONE@ 1:http://tulare.homelinux.net/index.html (time : 00:00:05) + + + + + + + + + + + + + + + + + + + + + level 1... 2:http://tulare.homelinux.net/Chance_Phelps.html (time : 00:00:24) 3:http://tulare.homelinux.net/underway.html (time : 00:00:29) The status line at the bottom of the browser screen shows "Done". Thanks for the interest. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
MySQL Question | jackpod | How-to Forum | 1 | 09-21-2006 09:30 PM |
question about the installation | west | Script Installation | 1 | 02-01-2005 11:52 AM |
can I do this, idiot question | 2catstango | How-to Forum | 0 | 10-18-2004 06:59 PM |
Question of the Day | Charter | The Mole Hole | 1 | 03-11-2004 09:50 PM |
indexing question | mudpit | How-to Forum | 5 | 01-28-2004 11:44 AM |