Date: Thu, 1 Mar 2001 21:14:46 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: keichii@peorth.iteration.net Cc: areilly@bigpond.net.au (Andrew Reilly), tlambert@primenet.com (Terry Lambert), jonathan@graehl.org (Jonathan Graehl), asmodai@FreeBSD.ORG, i18n@FreeBSD.ORG Subject: Re: Unicode, command line options, and configuration files, oh my! Message-ID: <200103012114.OAA06019@usr05.primenet.com> In-Reply-To: <20010301095049.A10822@peorth.iteration.net> from "Michael C . Wu" at Mar 01, 2001 09:50:49 AM
next in thread | previous in thread | raw e-mail | index | archive | help
> | > | In general, this means that for Unicode data stored for > | > | directory entries would require that a directory entry > | > | block would have to be 512b, whereas for UTF-8, we are > | > | talking 2048b (2k). > | > | It would still have to be larger than 512b using a 16-bit > | encoding, wouldn't it? > > Yes, and if we are making it larger than 512b, why do we need > to set a limit on ourselves? Directory entry block I/O is not handled through the normal VFS code. THis is because the directory entry blocks need to be modified atomically, and FS blocs can span page boundaries; for a sufficiently large FS block size, frags can exceed the page size. For some architectures, the page size is not := 4k. You need to look at the UFS directory manipulation code in the /sys/ufs/ufs directory so that you can uderstand the problem; while you are at it, look at the fsck and newfs and otherFS utility code which has to deal with directory entry blocks. It is not pretty. It would be nearly imposible to do directory I/O in FS blocks, and keep it atomic. There is already the risk of a 1024b directory entry spanning a track boundary, because we do not read mode page 2 from SCSI, and prohibit track spanning by FS objects. > | How do you propose to do that and still maintain Unix inode/link > | semantics? There isn't (necessarily) only one file name that > | the user sees, but there _is_ only one lump of file data. > > Do you see why nobody has been able to solve all this stuff easily? Wrong; Matt Day, Mark Muhelestein, and myself solved exactly this problem in exactly the FreeBSD VFS architecture and exactly the FreeBSD FFS and UFS code back in 1997. > I think having a journaling filesystem could solve this. So can UFS/FFS. Journalling has nothing to do with the underlying problem here, which is conversion from a fixed length storage to a variable length storage, where the underlying media has fixed length blocks into which you have to map things. Consider a CDROM FS for music and video, running in a file set up as a device. The blocks of such an FS could not be aligned within a page, since they are odd sized. How do you mmap() an object in such an FS? > NTFS gives up the ability to switch charsets in the harddrives. > (It is a pretty good assumption, since most users stay within > two languages.) And most of the userland tools, even the simple ones, > work with other languages without modifications, when compiled > by Visual Studio. The OLE character tyes are 16 bit. Some of these interfaces are not available in all WIN32.DLL implementations. > Java uses a weird scheme to negotiate the contents, where > the server and the client both have to agree in the charset. > Then you have to wrap strings in special functions. Then you > have to specifically tell java that the input is "international" input. > bla bla bla....Generally bad design and a big hassle. > (Have you ever seen a Chinese/Japanese/Korean java-enabled website > that _works_? I have seen very very few.) That's because it considers any I/O to be externalization; that's a stupid assumption. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-i18n" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200103012114.OAA06019>