From owner-freebsd-i18n Thu Mar 1 13:15:16 2001 Delivered-To: freebsd-i18n@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 2208737B71A; Thu, 1 Mar 2001 13:15:12 -0800 (PST) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id OAA05290; Thu, 1 Mar 2001 14:09:26 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp04.primenet.com, id smtpdAAAQ9aOik; Thu Mar 1 14:09:10 2001 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id OAA06019; Thu, 1 Mar 2001 14:14:46 -0700 (MST) From: Terry Lambert Message-Id: <200103012114.OAA06019@usr05.primenet.com> Subject: Re: Unicode, command line options, and configuration files, oh my! To: keichii@peorth.iteration.net Date: Thu, 1 Mar 2001 21:14:46 +0000 (GMT) Cc: areilly@bigpond.net.au (Andrew Reilly), tlambert@primenet.com (Terry Lambert), jonathan@graehl.org (Jonathan Graehl), asmodai@FreeBSD.ORG, i18n@FreeBSD.ORG In-Reply-To: <20010301095049.A10822@peorth.iteration.net> from "Michael C . Wu" at Mar 01, 2001 09:50:49 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-i18n@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > | > | In general, this means that for Unicode data stored for > | > | directory entries would require that a directory entry > | > | block would have to be 512b, whereas for UTF-8, we are > | > | talking 2048b (2k). > | > | It would still have to be larger than 512b using a 16-bit > | encoding, wouldn't it? > > Yes, and if we are making it larger than 512b, why do we need > to set a limit on ourselves? Directory entry block I/O is not handled through the normal VFS code. THis is because the directory entry blocks need to be modified atomically, and FS blocs can span page boundaries; for a sufficiently large FS block size, frags can exceed the page size. For some architectures, the page size is not := 4k. You need to look at the UFS directory manipulation code in the /sys/ufs/ufs directory so that you can uderstand the problem; while you are at it, look at the fsck and newfs and otherFS utility code which has to deal with directory entry blocks. It is not pretty. It would be nearly imposible to do directory I/O in FS blocks, and keep it atomic. There is already the risk of a 1024b directory entry spanning a track boundary, because we do not read mode page 2 from SCSI, and prohibit track spanning by FS objects. > | How do you propose to do that and still maintain Unix inode/link > | semantics? There isn't (necessarily) only one file name that > | the user sees, but there _is_ only one lump of file data. > > Do you see why nobody has been able to solve all this stuff easily? Wrong; Matt Day, Mark Muhelestein, and myself solved exactly this problem in exactly the FreeBSD VFS architecture and exactly the FreeBSD FFS and UFS code back in 1997. > I think having a journaling filesystem could solve this. So can UFS/FFS. Journalling has nothing to do with the underlying problem here, which is conversion from a fixed length storage to a variable length storage, where the underlying media has fixed length blocks into which you have to map things. Consider a CDROM FS for music and video, running in a file set up as a device. The blocks of such an FS could not be aligned within a page, since they are odd sized. How do you mmap() an object in such an FS? > NTFS gives up the ability to switch charsets in the harddrives. > (It is a pretty good assumption, since most users stay within > two languages.) And most of the userland tools, even the simple ones, > work with other languages without modifications, when compiled > by Visual Studio. The OLE character tyes are 16 bit. Some of these interfaces are not available in all WIN32.DLL implementations. > Java uses a weird scheme to negotiate the contents, where > the server and the client both have to agree in the charset. > Then you have to wrap strings in special functions. Then you > have to specifically tell java that the input is "international" input. > bla bla bla....Generally bad design and a big hassle. > (Have you ever seen a Chinese/Japanese/Korean java-enabled website > that _works_? I have seen very very few.) That's because it considers any I/O to be externalization; that's a stupid assumption. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-i18n" in the body of the message