Date: Tue, 20 Jan 1998 23:20:38 -0500 From: "Louis A. Mamakos" <louie@TransSys.COM> To: Terry Lambert <tlambert@primenet.com> Cc: daniel_sobral@voga.com.br, hackers@FreeBSD.ORG Subject: Re: Wide characters on tcp connections Message-ID: <199801210420.XAA23356@whizzo.TransSys.COM> In-Reply-To: Your message of "Tue, 20 Jan 1998 19:35:21 GMT." <199801201935.MAA27183@usr04.primenet.com> References: <199801201935.MAA27183@usr04.primenet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> > > The issue is one of stream synchronization. This is my main problem
> > > with UTF over non-error-checked links. If you have an implicit value
> > > boundry, then you are guaranteed a synchronized stream.
> >
> > Not applicable. TCP *is* an error checked link. Absent application
> > implementation errors, you shouldn't get unscynchronized.
>
> Uh, byte order?
Oh, come now. It's not like the problem of how to move multi-octet
quantities across an octet-oriented communications channel hasen't
been solved for quite a long way. For example, we manage to move
32 bit TCP sequence numbers (unsigned integers) without too many
byte-order implementation issues.
If you're unwilling to specify an encoding convention of your own,
there are plenty to choose from which provide a portable encoding
format suitable for many different implementation architectures.
You could use XDR.
You could use ASN.1 - plenty of rope here to hang yourself with.
You could choose "big-endian" byte orders.
You could choose "little-endian byte orders.
You could choose to make this problem much more difficult than anyone
might possibly imagine.
The point I made, which is completely lost, is that a reliable octet
stream transport protocol (like TCP) is not the place that you specify
multibyte character encoding standards. No one is (should be?) surprised
that the RS-232 standard is silent on this issue.
> > > Re: the FS example: a better example is to perhaps ask if a UNIX
> > > FS has provisions for storing "wide characters" (or preferrably,
> > > 16bit wchar_t values from ISO10646 aka Unicode) in *directory
> > > entries* (the current answer is "no, namei is too stupid").
> >
> > Why is this a better example? It's not like we're trying to name
> > transport endpoints with any sort of character strings; the issue
> > is "awareness" of the underlying {transport,storage} mechansim.
> >
> > There's really no point in reimplementing a transport protocol given
> > the literally thousands of man-hours of work by a lot of clever
> > people over more than a decade to make TCP work well.
>
> The question is "what is the network prepresentation of the byte values";
> see the other part of this thread...
My comment was in response to the original poster's remark that
if TCP wasn't going to do this, then was it a better idea to implement
a scheme using UDP or directly over IP.
If I had to choose, I'd use UTF-8 encodings in big-endian byte order. This
is, I believe, what the IETF has chosen when dealing with multi-byte
characters which are embedded within other protocols.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199801210420.XAA23356>
