PhpDig.net - View Single Post

Edomondo · 01-09-2004, 05:59 AM

Actually, the search engine I'm aiming at will have a support for different languages and encodings. Japanese characters will be processed like non multi-byte characters. The decoding to Japanese characters will be done at the end in the browser. Storage in the DB and plain TXT files will contain non-encoded characters.
So I prefer using a common charset in MySQL, not a specific one for Japanese.

I've set a list of all the possible separator (non encoded) in Japanese.
For Shift_Jis encoding, there will be:
Â@
ÂA
ÂB
ÂC
ÂD
ÂE
ÂF
ÂG
ÂH
ÂI
ÂJ
ÂK
ÂL
ÂM
ÂN
ÂO
ÂP
ÂQ
ÂR
ÂS
ÂT
ÂU
ÂV
ÂW
ÂX
ÂY
ÂZ
Â[
Â\
Â]
Â^
Â_
Â`
Âa
Âb
Âc
Âd
Âe
Âf
Âg
Âh
Âi
Âj
Âk
Âl
Âm
Ân
Âo
Âp
Âq
Âr
Âs
Ât
Âu
Âv
Âw
Âx
Ây
Âz
Â{
Â|
Â}
Â~
Â
Ââ‚¬
ÂÂ
Ââ€š
ÂÆ’
Â"
Ââ€¦
Ââ€*
Ââ€¡
ÂË†
Ââ€°
ÂÅ*
Ââ€¹
ÂÅ’
ÂÂ
ÂÅ½
ÂÂ
ÂÂ
Â'
Â'
Â"
Â"
Âo
Â-
Â-
ÂËœ
Ââ„¢
ÂÅ¡
Ââ€º
ÂÅ“
ÂÂ
ÂÅ¾
ÂÅ¸
Â
ÂÂ¡
ÂÂ¢
ÂÂ£
ÂÂ¤
ÂÂ¥
ÂÂ¦
ÂÂ§
ÂÂ¨
ÂÂ©
ÂÂª
Â"
Â
ÂÂ¸
ÂÂ¹
ÂÂº
Â"
ÂÂ¼
ÂÂ½
ÂÂ¾
ÂÂ¿
ÂÃˆ
ÂÃ‰
ÂÃŠ
ÂÃ‹
ÂÃŒ
ÂÃ
ÂÃŽ
ÂÃš
ÂÃ›
ÂÃœ
ÂÃ
ÂÃž
ÂÃŸ
ÂÃ*
ÂÃ¡
ÂÃ¢
ÂÃ£
ÂÃ¤
ÂÃ¥
ÂÃ¦
ÂÃ§
ÂÃ¨
ÂÃ°
ÂÃ±
ÂÃ²
ÂÃ³
ÂÃ´
ÂÃµ
ÂÃ¶
ÂÃ·
ÂÃ¼

What would be the fastest way to achieve this?

$phpdig_string_subst for Shift_Jis would look like:

PHP Code:


			
$phpdig_string_subst['Shift_Jis'] = 'A:A,a:a,B:B,b:b,C:C,c:c,D:D,d:d,E:E,e:e,F:F,f:f,G:G,g:g,H:H,h:h,I:I,i:i,J:J,j:j,K:K,k:k,L:L,l:l,M:M,m:m,N:N,n:n,O:O,o:o,P:P,p:p,Q:Q,q:q,R:R,r:r,S:S,s:s,T:T,t:t,U:U,u:u,V:V,v:v,W:W,w:w,X:X,x:x,Y:Y,y:y,Z:Z,z:z';

Is that correct?

Building a correct $phpdig_words_chars wouldn't be a problem too. I'll post one try soon for both Shift_Jis and EUC-JP.

01-09-2004, 05:59 AM	#13
Edomondo Orange Mole Join Date: Jan 2004 Location: In outer space Posts: 37	Actually, the search engine I'm aiming at will have a support for different languages and encodings. Japanese characters will be processed like non multi-byte characters. The decoding to Japanese characters will be done at the end in the browser. Storage in the DB and plain TXT files will contain non-encoded characters. So I prefer using a common charset in MySQL, not a specific one for Japanese. I've set a list of all the possible separator (non encoded) in Japanese. For Shift_Jis encoding, there will be: Â@ ÂA ÂB ÂC ÂD ÂE ÂF ÂG ÂH ÂI ÂJ ÂK ÂL ÂM ÂN ÂO ÂP ÂQ ÂR ÂS ÂT ÂU ÂV ÂW ÂX ÂY ÂZ Â[ Â\ Â] Â^ Â_ Â` Âa Âb Âc Âd Âe Âf Âg Âh Âi Âj Âk Âl Âm Ân Âo Âp Âq Âr Âs Ât Âu Âv Âw Âx Ây Âz Â{ Â\| Â} Â~ Â Ââ‚¬ ÂÂ Ââ€š ÂÆ’ Â" Ââ€¦ Ââ€* Ââ€¡ ÂË† Ââ€° ÂÅ* Ââ€¹ ÂÅ’ ÂÂ ÂÅ½ ÂÂ ÂÂ Â' Â' Â" Â" Âo Â- Â- ÂËœ Ââ„¢ ÂÅ¡ Ââ€º ÂÅ“ ÂÂ ÂÅ¾ ÂÅ¸ Â ÂÂ¡ ÂÂ¢ ÂÂ£ ÂÂ¤ ÂÂ¥ ÂÂ¦ ÂÂ§ ÂÂ¨ ÂÂ© ÂÂª Â" Â ÂÂ¸ ÂÂ¹ ÂÂº Â" ÂÂ¼ ÂÂ½ ÂÂ¾ ÂÂ¿ ÂÃˆ ÂÃ‰ ÂÃŠ ÂÃ‹ ÂÃŒ ÂÃ ÂÃŽ ÂÃš ÂÃ› ÂÃœ ÂÃ ÂÃž ÂÃŸ ÂÃ* ÂÃ¡ ÂÃ¢ ÂÃ£ ÂÃ¤ ÂÃ¥ ÂÃ¦ ÂÃ§ ÂÃ¨ ÂÃ° ÂÃ± ÂÃ² ÂÃ³ ÂÃ´ ÂÃµ ÂÃ¶ ÂÃ· ÂÃ¼ What would be the fastest way to achieve this? $phpdig_string_subst for Shift_Jis would look like: PHP Code: `$phpdig_string_subst['Shift_Jis'] = 'A:A,a:a,B:B,b:b,C:C,c:c,D:D,d:d,E:E,e:e,F:F,f:f,G:G,g:g,H:H,h:h,I:I,i:i,J:J,j:j,K:K,k:k,L:L,l:l,M:M,m:m,N:N,n:n,O:O,o:o,P:P,p:p,Q:Q,q:q,R:R,r:r,S:S,s:s,T:T,t:t,U:U,u:u,V:V,v:v,W:W,w:w,X:X,x:x,Y:Y,y:y,Z:Z,z:z';` Is that correct? Building a correct $phpdig_words_chars wouldn't be a problem too. I'll post one try soon for both Shift_Jis and EUC-JP. __________________ UchÃ» Senshi Edomondo http://www.leijiverse.com http://shonen-kokoro.fr.st http://tsukanomanoharu.fr.st Last edited by Edomondo; 01-09-2004 at 06:03 AM.