PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 12-08-2003, 01:15 AM   #1
jvalej
Green Mole
 
Join Date: Dec 2003
Posts: 2
windows-1251 encoding

Hello all!

I would like to configure PhpDig, so that it can search pages with windows-1251 (cyrillic) encoding.

I have viewed two similar threads in this forum, on questions of encoding, located here:

ISO-8859-5
ISO-8859-7

The ISO-8859-7 thread is quite extensive, but to say the truth, I still have little clue on how to include windows-1251 encoding support into PhpDig...

I have found a couple of pages on windows-1251 on these sites:

http://www.sensi.org/~alec/locale/other/win1251.html
http://www.cs.susu.ac.ru/RS6000/tbcp1251.html

Would please anyone help me, how and which characters do I add to the:

$phpdig_string_subst['windows-1251']

and

$phpdig_words_chars['windows-1251']


Thank you very much!!!
jvalej is offline   Reply With Quote
Old 12-08-2003, 03:48 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Windows-1251 Characters (note A0 is a space):

A0-AF _ Ў ў Ј ¤ Ґ ¦ § Ё © Є « ¬ _ ® Ї
B0-BF ° ± І і ґ µ ¶ · ё № є » ј Ѕ ѕ ї
C0-CF А Б В Г Д Е Ж З И Й К Л М Н О П
D0-DF Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я
E0-EF а б в г д е ж з и й к л м н о п
F0-FF р с т у ф х ц ч ш щ ъ ы ь э ю я

Those Characters in ASCII (note A0 is a space):

A0-AF _ ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ _ ® ¯
B0-BF ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
C0-CF À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
D0-DF Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
E0-EF * á â ã ä å æ ç è é ê ë ì * î ï
F0-FF ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ

Then complete this table (I had to 'code' it to keep spacing):
Code:
Latin  Cyrillic  Hex  ASCII
-----  --------  ---  -----
A      A         C0   À
B      Б         C1   Á
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
a      a         E0   *
b      б         E1   á
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
I'm not very familiar with Cyrillic so I'm not sure if the entries in the above table are correct Latin to Cyrillic mappings. There might be characters that don't map, and I'm not sure what to do with those characters.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-30-2003, 05:32 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Here's a new approach/workaround for use with PhpDig 1.6.5.

In the config.php file set the following:
PHP Code:
define('PHPDIG_ENCODING','windows-1251');

// give functions something trivial to do
$phpdig_string_subst['windows-1251'] = 'Q:Q,q:q';

// remove word wrapping in the below line
$phpdig_words_chars['windows-1251'] = '[:alnum:]ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß*áâãäåæçèéêëì*îïðñòóôõö÷øùúûüýþÿ'
In addition, in the robot_functions.php file is a phpdigIndexFile function.

In the phpdigIndexFile function replace:
PHP Code:
global $common_words,$relative_script_path,$s_yes,$s_no,$br
with the following:
PHP Code:
global $phpdig_words_chars,$common_words,$relative_script_path,$s_yes,$s_no,$br
Also, in the phpdigIndexFile function replace:
PHP Code:
        if (strlen($key) > SMALL_WORDS_SIZE and strlen($key) <= MAX_WORDS_SIZE and !isset($common_words[$key]) and ereg('^[0-9a-zßðþ]',$key)) 
with the following:
PHP Code:
        if (strlen($key) > SMALL_WORDS_SIZE and strlen($key) <= MAX_WORDS_SIZE and !isset($common_words[$key]) and ereg('^['.$phpdig_words_chars[PHPDIG_ENCODING].']',$key)) 
Remember to remove any "word" wrapping in the above code and use PhpDig 1.6.5 if not used already.

Also you will need to index from scratch for the changes to take effect.

Please let me know how this method works for you.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-02-2004, 08:18 AM   #4
jvalej
Green Mole
 
Join Date: Dec 2003
Posts: 2
Happy New Year!

Thank you very much for your help! The windows-1251 solution that you have posted, works!

But now I have 2 other questions:


For exmaple, if I search for the 3 letter word (search link to which is given below) there's no highlighting being made for it:

http://oasiswithin.net/search/search...rt&lim_start=0

My "config.php" contains the following settings:

define('SMALL_WORDS_SIZE',2);


And If I search for the word:

http://oasiswithin.net/search/search...rt&lim_start=0

result number 2, also contains the words from the navigational menu, though this section should be excluded from being displayed, as I have the following settings in "config.php":

define('PHPDIG_EXCLUDE_COMMENT','<!-- *************************************');
define('PHPDIG_INCLUDE_COMMENT','************************************** -->');

The content which should be considered for the search engine is located between these 2 comment lines. Or maybe I understood it wrong?

And just to mention, the navigational section of the site is not contained in the content file which is being indexed, but is being included to this file via PHP include call, from an external .HTML file. Maybe it has to do something with this?..


Thank you!
jvalej is offline   Reply With Quote
Old 01-02-2004, 09:51 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. For the first part, it looks like the highlighting isn't picking up the case sensitivity. For this, try the following:

In the config.php file, replace:
PHP Code:
// give functions something trivial to do
$phpdig_string_subst['windows-1251'] = 'Q:Q,q:q';

// remove word wrapping in the below line
$phpdig_words_chars['windows-1251'] = '[:alnum:]ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß*áâãäåæçèéêëì*îïðñòóôõö÷øùúûüýþÿ'
with the following:
PHP Code:
// remove word wrapping in the below line
$phpdig_string_subst['windows-1251'] = 'À:*,Á:á,Â:â,Ã:ã,Ä:ä,Å:å,Æ:æ,Ç:ç,È:è,É:é,Ê:ê,Ë:ë,Ì:ì,Í:*,Î:î,Ï:ï,Ð:ð,Ñ:ñ,Ò:ò,Ó:ó,Ô:ô,Õ:õ,Ö:ö,×:÷,Ø:ø,Ù:ù,Ú:ú,Û:û,Ü:ü,Ý:ý,Þ:þ,ß:ÿ';

// remove word wrapping in the below line
$phpdig_words_chars['windows-1251'] = '[:alnum:]ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß*áâãäåæçèéêëì*îïðñòóôõö÷øùúûüýþÿ'
For the second part, to use the PhpDig exclude/include comments with the definitions given it works like below, where the PhpDig exclude/include comments must each be on their own line:
PHP Code:
define('PHPDIG_EXCLUDE_COMMENT','<!-- *************************************');
define('PHPDIG_INCLUDE_COMMENT','************************************** -->'); 
Code:
<!-- *************************************
the content
to exclude
goes here
************************************** -->
If you look at the HTML source for result two, the PhpDig exclude/include comments do not surround the navigational menu so that change will need to be made and then a reindex done.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Encoding Problem. Please help Paka76 How-to Forum 0 01-04-2006 07:15 AM
Russian (windows-1251) translation for PhpDig 1.8.7 AlexFree Mod Submissions 0 01-25-2005 04:05 AM
Russian (windows-1251) translation Voldar Mod Submissions 1 01-22-2005 07:41 AM
Encoding Problem starks How-to Forum 4 01-06-2005 07:12 PM
keywords in windows-1251 encoding miscellone Mod Submissions 0 01-29-2004 02:26 AM


All times are GMT -8. The time now is 05:08 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.