|
01-14-2005, 12:31 AM | #1 |
Green Mole
Join Date: Jan 2005
Location: Brittany - FRANCE
Posts: 4
|
\3 at the right of the searched keyword
Hello everyone !
I'd like to say first that PHP dig is excellent, don't have any configuration problems or so, except that on my result page, i always have a "\3" at the right of the searched keyword. Has anyone had this kind of problem ? What to do to get rif of that \3 ? I am on a windows XP Home, running PHP Dig on local with easy PHP 1.6 apache. Also wanted to have a clue about the number of pages it is possible to crawl before MYSQL gets overloaded. I have 25,000 pages crawled for a 90 Mo database. It starts to get slow (7 or 8 sec) on some type of search. Am i reaching the maximum capabilities of MYSQL and or PHP Dig ? Thanks in advance for all your answers ? |
01-14-2005, 01:33 AM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
It looks like an encoding issue. For example, search on "lien concerné" (without quotes) and see the first two results:
Palestine ... l’aborde déjà , cela répond à une question qui a été posée sur le lien\3 av... ...: « médiatisation, » d’une part et « parti-pris » de l’autre cela concerne\3 bien « Vivre ensemble et violence »… ... ...sse se présenter à chacune des 2 parties de façon impartiale. En ce qui concerne\3 la Palestine, çà ... ...aéliens. Donc, les Etats-Unis ne sont pas les mieux placés, et en ce qui concerne\3 l’ONU, on n’arrête pas de constater que l’ONU... http://www.pingouins.com/Temoignages/Palestine/body_palestine.html 45.1 k www.infomer.fr ...bus en ce qui concerne\3 la littérature maritime le marin, 7 mars 2003 . br 14/02/03 Merveilles des fonds sous-marins - De la mer Rouge aux trois océans... ...Paris. Tél : 01 44 32 10 70. Fax : 01 40 51 73 16 . 280 pages; 25 euros br Lien\3 en relation : www.oceano.org 10/01/03 Thaiti et ses archipels - Ce ... ...Tél : 01 43 94 92 88. Fax : 01 43 94 02 45 . 160 pages; 35 euros. br br br Lien\3 en relation : www.anako.com 3/01/03 Lumières d'Oman - Ce livre est le ... ...tionale et de la Recherche, 1 rue Descartes, 75005 Paris. Prix : 40 euros. Lien\3 en relation : www.cths.fr 6/09/02 `Guide de la pêche Ã* pied` ... http://www.lemarin.fr/2-Pagemarin/PG-livrebord.html 862.8 k If you go to the http://www.pingouins.com/Temoignages/Palestine/body_palestine.html page and look at the HTML source, you will see that it is encoded as utf-8. However, if you look at the HTML source of the search results page, it is encoded as iso-8859-1. PhpDig does not support multiple or multi-byte encodings. The choosen encoding applies to all indexed documents and the admin interface. Choose one encoding per installation and stick with it. Reinstall PhpDig in a test location, and only index documents that are encoded with the same encoding as PHPDIG_ENCODING in the config file, and see if that makes the \3s go away.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-14-2005, 02:48 AM | #3 |
Green Mole
Join Date: Jan 2005
Location: Brittany - FRANCE
Posts: 4
|
thanks for your quick answer !
I installed a second version of the 1.8.6 in a new directory, without changing anything in the files. For a test I crawled the index page of PHP Dig, which is on ISO-8859-1. The config file is also on : define('PHPDIG_ENCODING','iso-8859-1'); But still i get those annoying \3 and also \1 in the title of the found page, see attached for details. Don't see what else to do to fix it ; waiting for some help... Thanks. |
01-14-2005, 10:41 AM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Look in phpdig_functions.php for the phpdigHighlight function and replace the two instances of "\\1<^#_>\\2</_#^>\\3" with '\1<^#_>\2</_#^>\3' and do another search. BTW, please don't use PhpDig on this site. PhpDig is free, but my bandwidth is not free. Thanks.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
01-14-2005, 01:13 PM | #5 |
Green Mole
Join Date: Jan 2005
Location: Brittany - FRANCE
Posts: 4
|
you got almost it !
here is what i have on line 154 for a result page without the \3 : $string = @eregi_replace($ereg,"\\1<^#_>\\2</_#^>\\3",@eregi_replace($ereg,"\\1<^#_>\\2</_#^>",$string)); My problem is fixed. Thanks a lot. Hope that may help some other users ! All the best. |
01-14-2005, 02:55 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Yes, I've seen that "fix" before, but if the last \\3 were a global problem, then \3s should show up in the online demo, but they don't so that leads me to believe it is something else causing the problem, and this "fix" may or may not really be a fix.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
No most searched terms, biggest results, most 0 results, last search queries, etc. | jongag1 | How-to Forum | 6 | 04-22-2005 11:43 AM |
title and keyword | christophe | How-to Forum | 3 | 01-09-2005 05:10 AM |
Easy way to add most & last searched queries to web page? | guinessec | How-to Forum | 0 | 12-01-2004 12:08 PM |
What filetypes are searched. | acollins22 | How-to Forum | 2 | 07-23-2004 12:47 AM |