From owner-freebsd-arch Thu Mar 1 13: 0:10 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140]) by hub.freebsd.org (Postfix) with ESMTP id 9554437B719; Thu, 1 Mar 2001 12:59:58 -0800 (PST) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp10.phx.gblx.net (8.9.3/8.9.3) id NAA76472; Thu, 1 Mar 2001 13:59:35 -0700 Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp10.phx.gblx.net, id smtpdem4Fqa; Thu Mar 1 13:59:24 2001 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id NAA05439; Thu, 1 Mar 2001 13:59:43 -0700 (MST) From: Terry Lambert Message-Id: <200103012059.NAA05439@usr05.primenet.com> Subject: Re: Unicode, command line options, and configuration files, oh my! To: areilly@bigpond.net.au (Andrew Reilly) Date: Thu, 1 Mar 2001 20:59:43 +0000 (GMT) Cc: keichii@peorth.iteration.net (Michael C . Wu), tlambert@primenet.com (Terry Lambert), jonathan@graehl.org (Jonathan Graehl), freebsd-arch@FreeBSD.ORG (freebsd-Arch), i18n@FreeBSD.ORG In-Reply-To: <20010301174513.A65013@gurney.reilly.home> from "Andrew Reilly" at Mar 01, 2001 05:45:13 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > | In general, this means that for Unicode data stored for > > | directory entries would require that a directory entry > > | block would have to be 512b, whereas for UTF-8, we are > > | talking 2048b (2k). > > It would still have to be larger than 512b using a 16-bit > encoding, wouldn't it? Yes; 1024b; sorry about that, it was an error. The point was supposed to be that, if you go look at the directory entry code, it would be a lot easier to implement 1k instead of 2k (we did this before when we ported the FreeBSD VFS to Windows 95 and supported both the 256 character Unicode and the 8.3 namespaces simultaneously). > > | If the same approach is used as the current UFS code uses, > > | then these operations will need to be directory entry block > > | atomic. > > > > In short, we can save the file name that the user sees > > with the file data. The filesystem and the kernel sees > > some other naming scheme determined by the FS/kernel. > > How do you propose to do that and still maintain Unix inode/link > semantics? There isn't (necessarily) only one file name that > the user sees, but there _is_ only one lump of file data. How do hard links work at all today, under the same conditions? The directory entry is just a reference to the inode; this is not like ISO or VFAT, where the directory entry _is_ the inode. > > | On top of that, we have Microsoft and Java interoperability to > > | consider, distasteful as that may be to some. > > > > M$ has a pretty good implementation here. > > Java I18N sucks really bad. > > Could you give a quick description of why one of these is good > and the other bad, for the bennefit of someone who knows > neither? My take on this, which may not be the same as his, is that the Microsoft implementation uses the processing representation as the storage representation, whereas Java uses UTF-8 for the storage representation. Java also deals in strings composed of "bytes" instead of strings composed of "characters", which makes string processing problematic, if the string is an I18N string; consider that it has no functions similar to XPG/4 mbtowc() or other interning/externing functions that it would use to deal with them. It's kind of like the problem with Java letting you instance objects without a default constructor being required to make them valid; the JavaMail API is rife with examples of this type of thing. You can see it pretty easily, when you try to write those same interfaces in C++, since C++ doesn't permit that sort of thing to happen (instancing without initialization is not possible in C++; there is *always* a default constructor). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message