Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Jan 1998 10:40:47 -0500 (EST)
From:      John Fieber <jfieber@indiana.edu>
To:        Andrew Kenneth Milton <akm@mother.sneaker.net.au>
Cc:        "Louis A. Mamakos" <louie@TransSys.COM>, daniel_sobral@voga.com.br, tlambert@primenet.com, hackers@FreeBSD.ORG
Subject:   Re: Wide characters on tcp connections
Message-ID:  <Pine.BSF.3.96.980120101241.26398Z-100000@fallout.campusview.indiana.edu>
In-Reply-To: <199801200415.PAA17887@mother.sneaker.net.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 20 Jan 1998, Andrew Kenneth Milton wrote:

> | If you're looking for a standard way to move multibyte characters, then
> | choose any one of a number of encodings already used to store multibyte
> | characters in files.
> 
> Moving them's not quite the same as storing them.... byte orders, usually
> come into play a lot more when you've got to shunt the data across a network.
> 
> I think Unicode defines that it is to be stored in network byte order.

Maybe this will clarify things a bit.  From _The Unicode Standard
2.0_, Section 3.1 Conformance Requirements: 

C1. A process shall interpret Unicode code values as 16-bit
    quantities. 

C2. The Unicode Standard does not specify any order of bytes
    inside a Unicode value.
    
C3. A process shall interpret a Unicode value that has been
    serialized into a sequence of bytes, by most significant byte
    first, in the absence of higher level protocols.

If you think of writing to a file as serializing, then C3
applies.  If you think of it as dumping memory, then C2 applies. 
I believe NT takes generally takes the C2 route. Terry, can you
confirm this?  How about for IPC? 

Just as a footnote, UTF-8 is a big win for English text because
it generally ends up 1 character == 1 byte, but is a big loss for
CJK (among others) where 1 character == 3 bytes.  UTF-8 is no
silver bullet for endian debates.

-john




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.980120101241.26398Z-100000>