PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 09-21-2006, 03:17 PM   #1
jackpod
Green Mole
 
Join Date: Sep 2006
Posts: 4
UTF-8 Question

If my html pages are UTF-8 what are the consequences of using PhpDig? It seems to work ok. Am I missing something? Also, is there a plan to support UTF-8 in an upcoming release? It seems to me that this is crucial as UTF-8 is quite common now.

Thanks, any help would be appreciated.
jackpod is offline   Reply With Quote
Old 09-22-2006, 03:11 PM   #2
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
UTF-8 has the following properties:

UCS characters U+0000 to U+007F (ASCII) are encoded simply as bytes 0x00 to 0x7F (ASCII compatibility). This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8.
All UCS characters >U+007F are encoded as a sequence of several bytes, each of which has the most significant bit set. Therefore, no ASCII byte (0x00-0x7F) can appear as part of any other character.
The first byte of a multibyte sequence that represents a non-ASCII character is always in the range 0xC0 to 0xFD and it indicates how many bytes follow for this character. All further bytes in a multibyte sequence are in the range 0x80 to 0xBF. This allows easy resynchronization and makes the encoding stateless and robust against missing bytes.
All possible 231 UCS codes can be encoded.
UTF-8 encoded characters may theoretically be up to six bytes long, however 16-bit BMP characters are only up to three bytes long.
The sorting order of Bigendian UCS-4 byte strings is preserved.
The bytes 0xFE and 0xFF are never used in the UTF-8 encoding.
==============================================
In addition to all that, UTF-8 was introduced to provide an ASCII backwards compatible multi-byte encoding. The definitions of UTF-8 in UCS and Unicode differed originally slightly, because in UCS, up to 6-byte long UTF-8 sequences were possible to represent characters up to U-7FFFFFFF, while in Unicode only up to 4-byte long UTF-8 sequences are defined to represent characters up to U-0010FFFF. (The difference was in essence the same as between UCS-4 and UTF-32.)
Dave A is offline   Reply With Quote
Old 09-22-2006, 03:20 PM   #3
jackpod
Green Mole
 
Join Date: Sep 2006
Posts: 4
I am sorry, but that info is a litte over my head. Mainly I just wanted answers to my specific questions. Maybe I should revise them slightly. What are the consequences of using PhpDig with UTF-8 files? And is there a plan to support UTF-8 in an upcoming release?
jackpod is offline   Reply With Quote
Old 09-22-2006, 06:03 PM   #4
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
The only problem you may get is that in some results a few characters may have odd letters displayed like accents above them.
Dave A is offline   Reply With Quote
Old 09-22-2006, 06:10 PM   #5
jackpod
Green Mole
 
Join Date: Sep 2006
Posts: 4
Thank you so much for you help.
jackpod is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
UTF-8 support Zebulon Mod Requests 0 12-05-2006 07:38 AM
utf-8 pages djuritz How-to Forum 2 07-02-2006 12:03 PM
Storing UTF-8 in MySQL Edomondo Coding & Tutorials 2 02-17-2005 02:35 AM
other UTF-8 languages miladmovie How-to Forum 1 02-08-2005 10:28 AM
utf-8 support kozlovsk How-to Forum 1 10-27-2004 05:56 AM


All times are GMT -8. The time now is 04:43 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.