View Single Post
Old 01-14-2004, 05:16 AM   #18
Edomondo
Orange Mole
 
Edomondo's Avatar
 
Join Date: Jan 2004
Location: In outer space
Posts: 37
I'm now facing another problem in striping strings.
ex: ¥³¡¼¥Ê¡¼¤â¤¢¤ê¤Þ¤¹ (in EUC-JP)
is made of 9 mutli-byte characters:
¥³ ¡¼ ¥Ê ¡¼ ¤â ¤¢ ¤ê ¤Þ ¤¹
But the script I wrote considers ¢¤ as a separator and replace it by a space, though it is the end of ¤¢ and the beginning of ¤ê (2 multi-byte characters).
The script returns: ¥³¡¼¥Ê¡¼¤â¤ ê¤Þ¤¹. What result in the end of the string being nosense.

The only way to get rid of this bug would be to check each 2 characters to see if it is a mutli-byte character or not, and replace it by a space if it is a separator. But such a script wouldn't be too much time-consuming? Any idea on how to achieve this?
Edomondo is offline   Reply With Quote