|
12-15-2004, 06:53 AM | #1 |
Green Mole
Join Date: Dec 2003
Posts: 5
|
Too few pages indexed, Umlaut problem
Hi there,
just upgraded from 1.6.0 to 1.8.5 The site contains about 800 pages, now all of a sudden only 250 are indexed. I made sure the max level of depth and links is set to 20 in both the index admin panel and the config file, but to no avail. I am making heavy use of phpdiginclude and exclude comments in the "middle" of the code, this hasn't changed though. What might be the problem? Secondly, when at the beginning of a title oder description string, Umlauts (e.g. Ä or Ä as HTML entity) are displayed in lower case even if they're upper case. Any clue? Thanks, Bernd |
12-15-2004, 07:27 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
1) Search depth to large number, links per to zero, LIMIT_TO_DIRECTORY to false.
2) In config.php find: PHP Code:
PHP Code:
PHP Code:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-15-2004, 09:38 AM | #3 |
Green Mole
Join Date: Dec 2003
Posts: 5
|
Thanks a lot! Works great!
|
12-16-2004, 09:24 AM | #4 |
Green Mole
Join Date: Mar 2004
Posts: 1
|
As of 1.8.6 more entities are shown wrong in the search results. So I digged around in the code and came across the following question:
Why do you use the custom $spec array instead of just reversing the function of htmlentities? e.g. replace your existing code in robot_functions.php: Code:
// first case-sensitive and then case-insensitive //tries to replace htmlentities by ascii equivalent foreach ($spec as $entity => $char) { $text = ereg_replace ($entity."[;]?",$char,$text); $title = ereg_replace ($entity."[;]?",$char,$title); } //tries to replace htmlentities by ascii equivalent foreach ($spec as $entity => $char) { $text = eregi_replace ($entity."[;]?",$char,$text); $title = eregi_replace ($entity."[;]?",$char,$title); } Code:
$trans = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES); $trans = array_flip($trans); $text = strtr($text, $trans); $title = strtr($title, $trans); |
12-16-2004, 11:00 AM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
>> Why do you use the custom $spec array instead of just reversing the function of htmlentities?
Because in a land long, long ago and far, far away... HTML page content may not be in correct form, and & # 039; versus & # 39; (without spaces) may cause an issue. PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Spider stops before all pages are indexed | halide | Troubleshooting | 3 | 07-19-2005 01:26 AM |
pages indexed | jmitchell | The Mole Hole | 8 | 02-15-2005 01:23 PM |
Pages not re-indexed | wx3 | Troubleshooting | 0 | 09-16-2004 06:53 PM |
Number of pages indexed | claudiomet | How-to Forum | 0 | 08-30-2004 03:26 PM |
how to index only not indexed pages? | zaartix | How-to Forum | 2 | 07-14-2004 05:23 AM |