Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Mar 2001 09:50:49 -0600
From:      "Michael C . Wu" <keichii@iteration.net>
To:        Andrew Reilly <areilly@bigpond.net.au>
Cc:        Terry Lambert <tlambert@primenet.com>, Jonathan Graehl <jonathan@graehl.org>, asmodai@FreeBSD.ORG, i18n@FreeBSD.ORG
Subject:   Re: Unicode, command line options, and configuration files, oh my!
Message-ID:  <20010301095049.A10822@peorth.iteration.net>
In-Reply-To: <20010301174513.A65013@gurney.reilly.home>; from areilly@bigpond.net.au on Thu, Mar 01, 2001 at 05:45:13PM %2B1100
References:  <NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan@graehl.org> <200103010541.WAA17385@usr05.primenet.com> <20010301000207.C4359@peorth.iteration.net> <20010301174513.A65013@gurney.reilly.home>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Mar 01, 2001 at 05:45:13PM +1100, Andrew Reilly scribbled:
| On Thu, Mar 01, 2001 at 12:02:07AM -0600, Michael C . Wu wrote:
| > Terry wrote:
| > | In general, this means that for Unicode data stored for
| > | directory entries would require that a directory entry
| > | block would have to be 512b, whereas for UTF-8, we are
| > | talking 2048b (2k).
| 
| It would still have to be larger than 512b using a 16-bit
| encoding, wouldn't it?

Yes, and if we are making it larger than 512b, why do we need
to set a limit on ourselves?

| > | If the same approach is used as the current UFS code uses,
| > | then these operations will need to be directory entry block
| > | atomic.
| > 
| > In short, we can save the file name that the user sees 
| > with the file data.  The filesystem and the kernel sees
| > some other naming scheme determined by the FS/kernel.
| 
| How do you propose to do that and still maintain Unix inode/link
| semantics?  There isn't (necessarily) only one file name that
| the user sees, but there _is_ only one lump of file data.

Do you see why nobody has been able to solve all this stuff easily?
I think having a journaling filesystem could solve this.

| > | On top of that, we have Microsoft and Java interoperability to
| > | consider, distasteful as that may be to some.
| > 
| > M$ has a pretty good implementation here.
| > Java I18N sucks really bad.
| 
| Could you give a quick description of why one of these is good
| and the other bad, for the bennefit of someone who knows
| neither?

NTFS gives up the ability to switch charsets in the harddrives.
(It is a pretty good assumption, since most users stay within
two languages.)  And most of the userland tools, even the simple ones,
work with other languages without modifications, when compiled
by Visual Studio.

Java uses a weird scheme to negotiate the contents, where
the server and the client both have to agree in the charset.
Then you have to wrap strings in special functions. Then you
have to specifically tell java that the input is "international" input.
bla bla bla....Generally bad design and a big hassle.
(Have you ever seen a Chinese/Japanese/Korean java-enabled website
 that _works_? I have seen very very few.)

-- 
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-i18n" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010301095049.A10822>