Date: Wed, 21 Jan 1998 10:33:54 +0100 From: Pierre.Beyssac@hsc.fr (Pierre Beyssac) To: tlambert@primenet.com (Terry Lambert) Cc: Pierre.Beyssac@hsc.fr (Pierre Beyssac), louie@TransSys.COM, daniel_sobral@voga.com.br, hackers@FreeBSD.ORG Subject: Re: Wide characters on tcp connections Message-ID: <19980121103354.EB02816@mars.hsc.fr> In-Reply-To: <199801202118.OAA27310@usr06.primenet.com>; from Terry Lambert on Jan 20, 1998 21:18:36 %2B0000 References: <19980120120216.OB37901@mars.hsc.fr> <199801202118.OAA27310@usr06.primenet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
According to Terry Lambert: [ UTF-8 ] > It will take up to 3 bytes to resync, since it can take up to 5 > bytes to represent a single 16 bit value. I assume you mean 32 bit? I think (don't have the draft handy) that's a little more complicated than that, because there if I remember correctly there are "collisions" between prefix codes and multibyte encodings. But that's the idea. > This assumes you are willing to push an arbitrary number of bytes > to get a 16 bit value to the other end of the pipe, and that you are > willing to take the computational overhead of the conversion, Yes, but you have to take a computational overhead anyway, even with fixed width characters, if you are to convert to network byte order. > and > that you are willing to treat your values as a stream instead of > an external data representation of a structure (ie: you are willling > to give up being able to tell the other end to expect a certain number > of bytes in a transaction). In the case of a telnet connection or mainly ASCII transfer, this makes sense: I certainly don't feel like I'm ready to take a fourfold performance loss due to wider characters :-) When putting this in a database system, you obviously don't _have_ to use UTF-8 internally, that's purely an implementation issue. Now I agree using UTF-8 in RPCs can be difficult, but after all isn't the RPC layer supposed to hide exactly these kinds of things from the application programmer? > The people who like UTF encoding are the people who've already had > thier mail forwarded to Hell, I'm quite sure you mean X400 :-). Don't worry about me, I'm not a UTF-8 specialist, not a UTF-8 user and even less a UTF-8 advocate (not to mention I hate X400). I was just pointing out that it would be silly to reinvent the wheel if that's to come up with something similar to UTF-8. -- Pierre.Beyssac@hsc.fr
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980121103354.EB02816>