PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 10-08-2003, 06:43 AM   #1
mkst
Green Mole
 
Join Date: Oct 2003
Posts: 11
iso-8859-7

Hello there!
I would like to know how I can change the iso to iso-8859-7 (greek). I read the documentation but could not understand how to set the

$phpdig_string_subst['iso-8859-7'] and

$phpdig_words_chars['iso-8859-7'] values.

Any help please??
mkst is offline   Reply With Quote
Old 10-08-2003, 01:51 PM   #2
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
You must define ALL - Chr: in this String
$phpdig_string_subst['iso-8859-7'] ='......here is iso-8859-7 chr ...........'

see:
http://www.softlab.ntua.gr/~sivann/xgrk/iso8859-7.html

and set:
define('PHPDIG_ENCODING','iso-8859-7');

Perhaps you found the code in ONE Line with google ?

-Roland-
Rolandks is offline   Reply With Quote
Old 10-09-2003, 02:19 AM   #3
mkst
Green Mole
 
Join Date: Oct 2003
Posts: 11
Thanks for your reply Rolandks!

oK, I think I got it......
What about the:

$phpdig_words_chars['iso-8859-2'] = '[:alnum:]ðþß';

What is it used for? Will I have to change it?

Regards,
Mike
mkst is offline   Reply With Quote
Old 10-09-2003, 05:56 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. The $phpdig_words_chars['iso-8859-2'] = '[:alnum:]ðþß'; is for non-accented 'lowercase' letters such as the German ß (pronouced 'ess set' if I remeber correctly) for example. Sort of think of it like anything that doesn't go in $phpdig_string_subst['iso-8859-2'] might go in $phpdig_words_chars['iso-8859-2']. If you will, once you get your 'iso-8859-7' set, please post it in the Mod Submissions forum in case others might want to use it. Thanks.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-26-2003, 09:04 AM   #5
mkst
Green Mole
 
Join Date: Oct 2003
Posts: 11
Unfortuanetely I can not make it to work.

I have used something like:

$phpdig_string_subst['iso-8859-7'] = 'Á:¢,Å:¸,Ç:¹,É:ºÚ,Ï:¼,Õ:¾,Ù:¿,Ü:á,å:Ý,ç:Þ,é:ßúÀ,ï :ü,õ:ýû*,ù:þ';

I have changed the encoding to: define ('PHPDIG_ENCODING','iso-8859-7');

I think that the problem is with $phpdig_words_chars['iso-8859-1']='[:alnum:]ðþß' string. What letters do i put within the [::] characters and what letters after this?

The script searches some of the english pages that i have in the site, but does not search any greek pages. The table 'keywords' only contains english words.

I would really need some help!
ps. I am using the 1.6.2 version.

Last edited by mkst; 11-26-2003 at 09:15 AM.
mkst is offline   Reply With Quote
Old 11-26-2003, 11:06 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I found the below ASCII representation of iso-8859-7 at http://www.gar.no/home/mats/8859-7.htm.
Code:
80-9F: unassigned
// note A0 is a space
A0-BF: _¡¢£¤¥¦§¨©ª«¬_®¯°±²³´µ¶·¸¹º»¼½¾¿
C0-DF: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
E0-FF: *áâãäåæçèéêëì*îïðñòóôõö÷øùúûüýþÿ
When making $phpdig_string_subst['iso-8859-7'], it's like making a key value set. For example, if the Latin A is like the Greek Ά (hex B6) then the $phpdig_string_subst['iso-8859-7'] variable would start like the following:
PHP Code:
$phpdig_string_subst['iso-8859-7'] = 'A:¶'
If Greek uses Á (hex C1) also like the Latin A, then $phpdig_string_subst['iso-8859-7'] would start like the following:
PHP Code:
$phpdig_string_subst['iso-8859-7'] = 'A:¶Á'
The same type of thing goes for Latin a. If Greek uses ά (hex DC) and á (hex E1) like the Latin a, then $phpdig_string_subst['iso-8859-7'] would start like the following:
PHP Code:
$phpdig_string_subst['iso-8859-7'] = 'A:¶Á,a:Üá'
The $phpdig_string_subst['iso-8859-7'] variable is for all accented or diacritic characters (basically all accented characters and those characters that do not copy paste into ASCII as the characters themsleves but rather copy paste as ASCII representations of the characters).

The $phpdig_words_chars['iso-8859-7'] variable is for lowercase non-accented characters (basically those lowercase non-accented characters that copy paste into ASCII as the characters themselves). An example of this would be Greek µ, so it could be added to $phpdig_words_chars['iso-8859-7'] like so:
PHP Code:
$phpdig_words_chars['iso-8859-7'] = '[:alnum:]ðþßµ'
Note that it is possible to have an ASCII representaion of a character be in $phpdig_string_subst['iso-8859-7'] and also have the ASCII character itself be in $phpdig_words_chars['iso-8859-7'].
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-27-2003, 06:21 AM   #7
mkst
Green Mole
 
Join Date: Oct 2003
Posts: 11
Thanks for your reply Charter!
...but I am still confused!!

Quote:
Originally posted by Charter
For example, if the Latin A is like the Greek Ά (hex B6) then the $phpdig_string_subst['iso-8859-7'] variable would start like the following:
PHP Code:
$phpdig_string_subst['iso-8859-7'] = 'A:¶'
If Greek uses A (hex C1) also like the Latin A, then $phpdig_string_subst['iso-8859-7'] would start like the following:
PHP Code:
$phpdig_string_subst['iso-8859-7'] = 'A:¶A'
.......
The $phpdig_words_chars['iso-8859-7'] variable is for lowercase non-accented characters (basically those lowercase non-accented characters that copy paste into ASCII as the characters themselves). An example of this would be Greek µ.....
What exactly do you mean by 'is like' ? I know that latin capital A looks like the greek capital Á but this is not the case for the lower case letters or some other capital letters.

And what exactly do you mean by '(basically those lowercase non-accented characters that copy paste into ASCII as the characters themselves)' ?

I have tried something like this:
PHP Code:
$phpdig_string_subst['iso-8859-7'] = 'A:¶A,a:Üá,E:Ÿ,e:åÝ,H:ǹ,h:çÞ,I:ɺÚ,i:éßúÀ,O:ϼ,o:ïü,Y:Õ¾Û,y:õýû*,L:Ë,l:ë,N:Í,n:*,V:Ù,v:ùþ,M:Ì,m:ì,P:Ð,p:ð,X:×,x:÷,K:Ê,k:ê,B:Â,b:â,C:Ø,c:ø,G:Ã,g:ã,D:Ä,d:ä,Z:Æ,z:æ,U:È,u:è,K:Ê,k:ê,J:Î,j:î,R:Ñ,r:ñ,S:Ó,s:óò,T:Ô,t:ô,F:Ö,f:ö'
and
PHP Code:
$phpdig_words_chars['iso-8859-7'] = '[:alnum:]ðþßìòñôèóäöãîêëæ÷øâ*ð'
I have also tried different variation of the above but still could not make it to work correct.

The engine indexes the site alright but only recoginzes and prints results for part of the keyword.
Also the 'keywords' table contains words with with latin letters only. It is this allright i guess uh?

Thank you for your time Charter, and i hope i am not much of a trouble

Last edited by mkst; 11-27-2003 at 07:57 AM.
mkst is offline   Reply With Quote
Old 11-27-2003, 08:10 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I'll use a German word as an example of what I mean by the 'is like' phrase. The German word Gästebuch means Guestbook. The ä in Gästebuch 'is like' the Latin a. Such characters like ä are stored as their Latin counterparts in the database for searching. When you copy paste a character into a text editor, it will either show up as the character or some ASCII equivalent of the character. The characters that show up as the actual character are the ones that go in $phpdig_words_chars['iso-8859-7'] but no accented characters should go in $phpdig_words_chars['iso-8859-7']. All accented or diacritic characters should go in $phpdig_string_subst['iso-8859-7'].
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-28-2003, 06:11 AM   #9
mkst
Green Mole
 
Join Date: Oct 2003
Posts: 11
Thank you for your reply Charter. It seems that i managed to create the right $phpdig_string_subst and $phpdig_words_chars.

However, I still have one problem regarding words that start with capital letter. I can only find a word that starts with certan capital letters, otherwise I get zero matches. The search works ok for lower case words.

Do you have any idea why this is happening?

Regards,
Mike
mkst is offline   Reply With Quote
Old 11-28-2003, 06:17 AM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. What are $phpdig_string_subst['iso-8859-7'] and $phpdig_words_chars['iso-8859-7'] currently set to? What capital letters are not working? Maybe there is a mismatched key value type pairing.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-28-2003, 06:25 AM   #11
mkst
Green Mole
 
Join Date: Oct 2003
Posts: 11
PHP Code:
$phpdig_string_subst['iso-8859-7'] = 'A:Á¶,a:Üá,B:Â,b:â,G:Ã,g:ã,D:Ä,d:ä,E:Ÿ,e:åÝ,Z:Æ,z:æ,H:ǹ,h:çÞ,U:È,u:è,I:ɺÚ,i:éßúÀ,K:Ê,k:ê,L:Ë,l:ë,M:Ì,m:ì,N:Í,n:*,J:Î,j:î,O:ϼ,o:ïü,P:Ð,p:ð,R:Ñ,r:ñ,S:Ó,s:óò,T:Ô,t:ô,Y:Õ¾Û,y:õýû*,F:Ö,f:ö,X:×,x:÷,C:Ø,c:ø,V:Ù,v:ùþ'
and
PHP Code:
$phpdig_words_chars['iso-8859-7'] = '[:alnum:]áâãäåæçèéêëì*îïðñóôõö÷øù'
I have double checked for type errors, dont think that this is the case.
Words starting with Á, ¶, Ð, Ì have no problem.

Last edited by mkst; 11-28-2003 at 06:30 AM.
mkst is offline   Reply With Quote
Old 11-28-2003, 06:51 AM   #12
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Of áâãäåæçèéêëì*îïðñóôõö÷øù the only ones that should be in the $phpdig_words_chars['iso-8859-7'] variable are æçðø like so:
PHP Code:
$phpdig_words_chars['iso-8859-7'] = '[:alnum:]æçðø'
These áâãäåèéêëì*îïñóôõö÷ù are accented/diacritic characters and need to be matched up to their Latin counterparts in $phpdig_string_subst['iso-8859-7'].
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-28-2003, 07:39 AM   #13
mkst
Green Mole
 
Join Date: Oct 2003
Posts: 11
Thanks Charter but there is no improvent.
It is now worse than before....
mkst is offline   Reply With Quote
Old 11-28-2003, 10:37 AM   #14
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I am not very familiar with the Greek alphabet beyond mathematical usage. Below is what I came up with assuming that Latin A is like Greek Alpha, Latin a is like Greek alpha, and so forth. I make no claims of correctness.
PHP Code:
$phpdig_string_subst['iso-8859-7'] = 'A:¶Á,a:Üá,B:Â,G:Ã,g:ã,D:Ä,
d:ä,E:¸Å,e:Ýå,Z:Æ,z:æ,I:ºÉÚ,i:Àßéú,K:Ê,k:ê,L:Ë,l:ë,M:Ì,N:Í,n:*,
X:Î,x:î,O:¼Ï,o:ïü,P:Ð,p:ð,R:Ñ,r:ñ,S:Ó,s:òó,T:Ô,t:ô,Y:¾ÕÛ,y:*õûý'
;

$phpdig_words_chars['iso-8859-7'] = '[:alnum:]ßµ'
I was not sure what to do with the following characters: Eta, eta, Theta, theta, Phi, phi, Chi, chi, Psi, psi, Omega, omega.

I also made the following assumptions: Latin G is like Greek Gamma, Latin g is like Greek gamma, Latin R is like Greek Rho, Latin r is like Greek rho, Latin Y is like Greek Upsilon, Latin y is like Greek upsilon.

As I m not very familiar with the Greek language, this is the best that I can offer.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-24-2003, 02:41 AM   #15
mitsoskitsos
Green Mole
 
Join Date: Dec 2003
Posts: 7
Hi.
I am also trying to index greek pages with encoding 8859-7 and I have some problems.
I think that the origin of the problem is that greek characters are converted to latin and then putted in the keywords table.
Why is it necessary to convert the greek characters to latin?
I think that the engine would have worked much better and more accurate without this conversion.
Is there a hack that I could apply so greek characters won't be converted to latin?
mitsoskitsos is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
I want search RUSSIAN (ISO-8859-5) language in PHPDig, How to ??? Ivan How-to Forum 1 09-26-2003 04:30 PM


All times are GMT -8. The time now is 11:42 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.