From owner-freebsd-hackers Wed Jan 21 02:23:36 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id CAA11306 for hackers-outgoing; Wed, 21 Jan 1998 02:23:36 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from itesec.hsc.fr (root@itesec.hsc.fr [192.70.106.33]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id CAA11291 for ; Wed, 21 Jan 1998 02:23:22 -0800 (PST) (envelope-from pb@hsc.fr) Received: from mars.hsc.fr (pb@mars.hsc.fr [192.70.106.44]) by itesec.hsc.fr (8.8.8/8.8.5/itesec-1.10-nospam) with ESMTP id KAA29968; Wed, 21 Jan 1998 10:33:56 +0100 (MET) Received: (from pb@localhost) by mars.hsc.fr (8.8.5/8.8.5/pb-19970301) id KAA14368; Wed, 21 Jan 1998 10:33:55 +0100 (MET) Message-ID: <19980121103354.EB02816@mars.hsc.fr> Date: Wed, 21 Jan 1998 10:33:54 +0100 From: Pierre.Beyssac@hsc.fr (Pierre Beyssac) To: tlambert@primenet.com (Terry Lambert) Cc: Pierre.Beyssac@hsc.fr (Pierre Beyssac), louie@TransSys.COM, daniel_sobral@voga.com.br, hackers@FreeBSD.ORG Subject: Re: Wide characters on tcp connections References: <19980120120216.OB37901@mars.hsc.fr> <199801202118.OAA27310@usr06.primenet.com> X-Mailer: Mutt 0.59.1e Mime-Version: 1.0 In-Reply-To: <199801202118.OAA27310@usr06.primenet.com>; from Terry Lambert on Jan 20, 1998 21:18:36 +0000 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk According to Terry Lambert: [ UTF-8 ] > It will take up to 3 bytes to resync, since it can take up to 5 > bytes to represent a single 16 bit value. I assume you mean 32 bit? I think (don't have the draft handy) that's a little more complicated than that, because there if I remember correctly there are "collisions" between prefix codes and multibyte encodings. But that's the idea. > This assumes you are willing to push an arbitrary number of bytes > to get a 16 bit value to the other end of the pipe, and that you are > willing to take the computational overhead of the conversion, Yes, but you have to take a computational overhead anyway, even with fixed width characters, if you are to convert to network byte order. > and > that you are willing to treat your values as a stream instead of > an external data representation of a structure (ie: you are willling > to give up being able to tell the other end to expect a certain number > of bytes in a transaction). In the case of a telnet connection or mainly ASCII transfer, this makes sense: I certainly don't feel like I'm ready to take a fourfold performance loss due to wider characters :-) When putting this in a database system, you obviously don't _have_ to use UTF-8 internally, that's purely an implementation issue. Now I agree using UTF-8 in RPCs can be difficult, but after all isn't the RPC layer supposed to hide exactly these kinds of things from the application programmer? > The people who like UTF encoding are the people who've already had > thier mail forwarded to Hell, I'm quite sure you mean X400 :-). Don't worry about me, I'm not a UTF-8 specialist, not a UTF-8 user and even less a UTF-8 advocate (not to mention I hate X400). I was just pointing out that it would be silly to reinvent the wheel if that's to come up with something similar to UTF-8. -- Pierre.Beyssac@hsc.fr