OK, I found how to replace the punctuation.
In the previous example, using /* to tokenize the string can be achieved this way:
PHP Code:
<?php
$string = "This/*is/*an/*example/*0/*string.";
$separator = " ";
$replace_separator = array("/*" => $separator,
"." => $separator);
$string = trim(strtr($string, $replace_separator));
$tok = strtok($string, $separator);
while ($tok !== FALSE) { // try with while ($tok) to compare
echo "Word=$tok<br />";
$tok = strtok($separator);
}
?>
I guess I'll also have to give MAX_WORDS_SIZE the higher value possible.
Now, how can I configure $phpdig_string_subst['EUC-JP'] and $phpdig_string_chars['EUC-JP']? It's still a bit confusing to me.
Every character composing a multi-byte character will go in $phpdig_string_subst, right?
e.g. : ‚ÆÄÃ*l‹CÃŒ_é–Ÿ‰æÅ...
And $phpdig_string_chars['EUC-JP'] = '[:alnum:]'; seems correct as all characters will be converted to half-width EUC-JP characters during indexing.