From owner-freebsd-hackers Tue Jan 20 03:03:54 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id DAA25724 for hackers-outgoing; Tue, 20 Jan 1998 03:03:54 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from itesec.hsc.fr (root@itesec.hsc.fr [192.70.106.33]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id DAA25703 for ; Tue, 20 Jan 1998 03:03:46 -0800 (PST) (envelope-from pb@hsc.fr) Received: from mars.hsc.fr (pb@mars.hsc.fr [192.70.106.44]) by itesec.hsc.fr (8.8.8/8.8.5/itesec-1.10-nospam) with ESMTP id MAA16727; Tue, 20 Jan 1998 12:02:18 +0100 (MET) Received: (from pb@localhost) by mars.hsc.fr (8.8.5/8.8.5/pb-19970301) id MAA10049; Tue, 20 Jan 1998 12:02:17 +0100 (MET) Message-ID: <19980120120216.OB37901@mars.hsc.fr> Date: Tue, 20 Jan 1998 12:02:16 +0100 From: Pierre.Beyssac@hsc.fr (Pierre Beyssac) To: louie@TransSys.COM (Louis A. Mamakos) Cc: tlambert@primenet.com (Terry Lambert), daniel_sobral@voga.com.br, hackers@FreeBSD.ORG Subject: Re: Wide characters on tcp connections References: <199801191937.MAA05333@usr08.primenet.com> <199801200313.WAA20726@whizzo.TransSys.COM> X-Mailer: Mutt 0.59.1e Mime-Version: 1.0 In-Reply-To: <199801200313.WAA20726@whizzo.TransSys.COM>; from Louis A. Mamakos on Jan 19, 1998 22:13:02 -0500 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk According to Louis A. Mamakos: > > > course TCP, by itself, provides all support you need to send the > > > characters, but ignoring the practical problems would be akin to keeping to > > > IP (vs TCP or UDP) because that's all you _really_ need... > > > > The issue is one of stream synchronization. This is my main problem > > with UTF over non-error-checked links. If you have an implicit value > > boundry, then you are guaranteed a synchronized stream. > > Not applicable. TCP *is* an error checked link. Absent application > implementation errors, you shouldn't get unscynchronized. I can add that, if I've understood UTF-8 right, it's fairly easy to resynchronize in case you happen to lose sync. It just takes one or two lost or garbled chars. I think that UTF-8 is one of the ways to go. Its only drawback is that it's not compatible with "pure" 8 bits ISO-Latin-1 streams as it reuses 0x80-0xff. -- Pierre.Beyssac@hsc.fr