View Single Post
Old 01-08-2004, 12:04 PM   #11
Edomondo
Orange Mole
 
Edomondo's Avatar
 
Join Date: Jan 2004
Location: In outer space
Posts: 37
OK, I found how to replace the punctuation.
In the previous example, using /* to tokenize the string can be achieved this way:

PHP Code:
<?php
$string 
"This/*is/*an/*example/*0/*string.";
$separator " ";

$replace_separator = array("/*" => $separator,
                                              
"." => $separator);

$string trim(strtr($string$replace_separator));

$tok strtok($string$separator);
while (
$tok !== FALSE) { // try with while ($tok) to compare
   
echo "Word=$tok<br />";
   
$tok strtok($separator);
}
?>
I guess I'll also have to give MAX_WORDS_SIZE the higher value possible.

Now, how can I configure $phpdig_string_subst['EUC-JP'] and $phpdig_string_chars['EUC-JP']? It's still a bit confusing to me.

Every character composing a multi-byte character will go in $phpdig_string_subst, right?
e.g. : ‚ÆÄÃ*l‹CÃŒ_é–Ÿ‰æÅ...

And $phpdig_string_chars['EUC-JP'] = '[:alnum:]'; seems correct as all characters will be converted to half-width EUC-JP characters during indexing.
Edomondo is offline   Reply With Quote