From owner-freebsd-i18n  Wed Feb 28 21:59:27 2001
Delivered-To: freebsd-i18n@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id B7BC537B718; Wed, 28 Feb 2001 21:59:19 -0800 (PST)
	(envelope-from keichii@peorth.iteration.net)
Received: by peorth.iteration.net (Postfix, from userid 1001)
	id 445625955B; Wed, 28 Feb 2001 23:59:25 -0600 (CST)
Date: Wed, 28 Feb 2001 23:59:25 -0600
From: "Michael C . Wu" <keichii@iteration.net>
To: Jonathan Graehl <jonathan@graehl.org>
Cc: freebsd-Arch <freebsd-arch@FreeBSD.ORG>, i18n@freebsd.org
Subject: Re: Unicode, command line options, and configuration files, oh my!
Message-ID: <20010228235925.B4359@peorth.iteration.net>
Reply-To: "Michael C . Wu" <keichii@peorth.iteration.net>
Mail-Followup-To: "Michael C . Wu" <keichii@iteration.net>,
	Jonathan Graehl <jonathan@graehl.org>,
	freebsd-Arch <freebsd-arch@FreeBSD.ORG>, i18n@freebsd.org
References: <NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan@graehl.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan@graehl.org>; from jonathan@graehl.org on Wed, Feb 28, 2001 at 01:48:49PM -0800
X-PGP-Fingerprint: 5025 F691 F943 8128 48A8  5025 77CE 29C5 8FA1 2E20
X-PGP-Key-ID: 0x8FA12E20
Sender: owner-freebsd-i18n@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

People, there is an freebsd-i18n@freebsd.org for a reason.

On Wed, Feb 28, 2001 at 01:48:49PM -0800, Jonathan Graehl scribbled:
| How much change would be needed to have a Unicode-capable FreeBSD system?

A lot. and a lot more.

| Supposing the variable-length encoding is used, all existing text output,
| filenames, and string-based kernel interfaces should be compliant (although not

No, they are not that easy.

| capable of understanding multiple-byte-char input/output); would command line
| options be passed as byte-strings by a Unicode-capable shell?

No.

| There doesn't seem to be any impetus to systematically adopt Unicode (especially
| the fixed-two-bytes-per-char variant, which for most cases would simply double
| the storage/bandwidth requirement), although there are user-applications which

Not that easy :) Trust me.

| operate on multibyte text.  I am sure that by now admins and programmers in
| country XYZ are used to working with ASCII and pseudo-English (no matter how
| inconvenient it might be to generate from their keyboards).

It is the "assuming" part that got us in this I18N dilemma.

| 

[snip XML]

I really do not think using XML is the way to go, too much crud.
The K.I.S.S. principle should prevail here, especially in kernelland.

| Parsing of command line options (and positional parameters) is also largely
| ad-hoc.  Looking through /usr/src, I see that for the most case, it consists of
| a getopt loop with hand-coded cases, a hand-written usage string, and a
| hand-written man-page-usage.  Much like the XML DTD, it would make sense to
| generically specify (to the extent possible, and with user-defined code to the
| extent not) the syntax and semantics, and generate variable definitions,
| parsing/checking code, usage(), man page synopsis ...  While it would be

Do you realize that this means a rewrite for the 300mb of the
src/ that we have now?

| possible to have an expressive grammar for command line options, typically
| the -opts are order-independent, and there are only a few positional parameters
| (or else you put the mess into a configuration file).  There are a variety of
| packages out there, which I am seeking opinions on, not having tried any of
| them:
| 

[snip *freshmeat* stuff]
I have looked at those, not suitable, and they are GPL.

| any others?
| 
| ifconfig seemed to have one of the more enlightened-looking option parsers (an
| array of parameter information processed in a loop, rather than a bunch of

Because it needs to parse many many things.
But why do you need so called "smart" parsers when you only have one or
two options to parse?

| hard-coded cases) out of several FreeBSD programs I examined ... are there any
| other good examples?

ipfilter.

| It's also amusing to see how many different ways various servers in the tree can
| open a configuration file (path read from command line), write a pid file (path
| read from command line), daemonize, read an IP address/hostname and port (read
| from command line) and listen there, mask nonfatal signals, relinquish

It happens in a large code base.  However, to rewrite all of that
takes many many man-hours.  I really do not think we are up to that.

| priveleges - although I appreciate that different servers want to do things
| slightly differently.  Naturally, each of us is easily able to reuse our own
| code (preferably by libraries/macros/#include rather than copy/paste), but I
| think that there is a lot of common configuration/command-line code that could
| be coalesced behind a good-enough-extensible interface that we could reuse code

Glad to hear that people care about I18N.

-- 
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-i18n" in the body of the message


From owner-freebsd-i18n  Wed Feb 28 22: 2: 8 2001
Delivered-To: freebsd-i18n@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id AE32C37B719; Wed, 28 Feb 2001 22:02:01 -0800 (PST)
	(envelope-from keichii@peorth.iteration.net)
Received: by peorth.iteration.net (Postfix, from userid 1001)
	id 96CE85955B; Thu,  1 Mar 2001 00:02:07 -0600 (CST)
Date: Thu, 1 Mar 2001 00:02:07 -0600
From: "Michael C . Wu" <keichii@iteration.net>
To: Terry Lambert <tlambert@primenet.com>
Cc: Jonathan Graehl <jonathan@graehl.org>,
	freebsd-Arch <freebsd-arch@FreeBSD.ORG>, i18n@freebsd.org
Subject: Re: Unicode, command line options, and configuration files, oh my!
Message-ID: <20010301000207.C4359@peorth.iteration.net>
Reply-To: "Michael C . Wu" <keichii@peorth.iteration.net>
Mail-Followup-To: "Michael C . Wu" <keichii@iteration.net>,
	Terry Lambert <tlambert@primenet.com>,
	Jonathan Graehl <jonathan@graehl.org>,
	freebsd-Arch <freebsd-arch@FreeBSD.ORG>, i18n@freebsd.org
References: <NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan@graehl.org> <200103010541.WAA17385@usr05.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200103010541.WAA17385@usr05.primenet.com>; from tlambert@primenet.com on Thu, Mar 01, 2001 at 05:41:22AM +0000
X-PGP-Fingerprint: 5025 F691 F943 8128 48A8  5025 77CE 29C5 8FA1 2E20
X-PGP-Key-ID: 0x8FA12E20
Sender: owner-freebsd-i18n@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Use -i18n please. ")

On Thu, Mar 01, 2001 at 05:41:22AM +0000, Terry Lambert scribbled:
| [ ... Unicode ... ]
| 
| UTF encoded data is not fixed length in size.
| 
| POSIX specifies that file names can be up to 256 characters.
| 
| 256 characters UTF-8 encoded can vary from 256 to 1280
| characters.
|
| In general, this means that for Unicode data stored for
| directory entries would require that a directory entry
| block would have to be 512b, whereas for UTF-8, we are
| talking 2048b (2k).
| 
| If the same approach is used as the current UFS code uses,
| then these operations will need to be directory entry block
| atomic.

In short, we can save the file name that the user sees 
with the file data.  The filesystem and the kernel sees
some other naming scheme determined by the FS/kernel.

| FS stuff aside, most programs should use internal encoding.
| 
| For FS storage, fixed data records are also a problem, when
| using UTF-8 encoding.  The same goes for the ability to
| store fixed size input forms field data in databases, which
| like constraints set on record sizes.
| 
| 
| > There doesn't seem to be any impetus to systematically adopt
| > Unicode (especially the fixed-two-bytes-per-char variant,
| > which for most cases would simply double the storage/bandwidth
| > requirement), although there are user-applications which
| > operate on multibyte text.
| 
| UTF-8 is one character per byte for US ASCII, two bytes for
| the high page (128 characters) of ISO 8859-1, and three or more
| bytes for anything else.

Bad design. period.

| The idea that storage requirements increase is U.S. centric;
| all other character sets are penalized at least as much as if
| it were directly encoded instead of multibyte encoded, and
| the vast majority more penalized.

Yup, bad design. :)

| On top of that, we have Microsoft and Java interoperability to
| consider, distasteful as that may be to some.

M$ has a pretty good implementation here.
Java I18N sucks really bad.

| There's an interesting list of Unicode resources available at:
| http://www.unicode.org/unicode/onlinedat/products.html

-- 
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-i18n" in the body of the message


From owner-freebsd-i18n  Wed Feb 28 22:45:20 2001
Delivered-To: freebsd-i18n@freebsd.org
Received: from areilly.bpc-users.org (CPE-144-132-234-126.nsw.bigpond.net.au [144.132.234.126])
	by hub.freebsd.org (Postfix) with SMTP id 8D1CA37B719
	for <i18n@FreeBSD.ORG>; Wed, 28 Feb 2001 22:45:14 -0800 (PST)
	(envelope-from areilly@bigpond.net.au)
Received: (qmail 65096 invoked by uid 1000); 1 Mar 2001 06:45:13 -0000
From: "Andrew Reilly" <areilly@bigpond.net.au>
Date: Thu, 1 Mar 2001 17:45:13 +1100
To: "Michael C . Wu" <keichii@peorth.iteration.net>
Cc: Terry Lambert <tlambert@primenet.com>,
	Jonathan Graehl <jonathan@graehl.org>,
	freebsd-Arch <freebsd-arch@FreeBSD.ORG>, i18n@FreeBSD.ORG
Subject: Re: Unicode, command line options, and configuration files, oh my!
Message-ID: <20010301174513.A65013@gurney.reilly.home>
References: <NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan@graehl.org> <200103010541.WAA17385@usr05.primenet.com> <20010301000207.C4359@peorth.iteration.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010301000207.C4359@peorth.iteration.net>; from keichii@iteration.net on Thu, Mar 01, 2001 at 12:02:07AM -0600
Sender: owner-freebsd-i18n@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, Mar 01, 2001 at 12:02:07AM -0600, Michael C . Wu wrote:
> Terry wrote:
> | In general, this means that for Unicode data stored for
> | directory entries would require that a directory entry
> | block would have to be 512b, whereas for UTF-8, we are
> | talking 2048b (2k).

It would still have to be larger than 512b using a 16-bit
encoding, wouldn't it?

> | If the same approach is used as the current UFS code uses,
> | then these operations will need to be directory entry block
> | atomic.
> 
> In short, we can save the file name that the user sees 
> with the file data.  The filesystem and the kernel sees
> some other naming scheme determined by the FS/kernel.

How do you propose to do that and still maintain Unix inode/link
semantics?  There isn't (necessarily) only one file name that
the user sees, but there _is_ only one lump of file data.

> | On top of that, we have Microsoft and Java interoperability to
> | consider, distasteful as that may be to some.
> 
> M$ has a pretty good implementation here.
> Java I18N sucks really bad.

Could you give a quick description of why one of these is good
and the other bad, for the bennefit of someone who knows
neither?

-- 
Andrew

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-i18n" in the body of the message


From owner-freebsd-i18n  Thu Mar  1  7:50:46 2001
Delivered-To: freebsd-i18n@freebsd.org
Received: from peorth.iteration.net (peorth.iteration.net [208.190.180.178])
	by hub.freebsd.org (Postfix) with ESMTP
	id AEFAA37B718; Thu,  1 Mar 2001 07:50:42 -0800 (PST)
	(envelope-from keichii@peorth.iteration.net)
Received: by peorth.iteration.net (Postfix, from userid 1001)
	id 59BA95955D; Thu,  1 Mar 2001 09:50:49 -0600 (CST)
Date: Thu, 1 Mar 2001 09:50:49 -0600
From: "Michael C . Wu" <keichii@iteration.net>
To: Andrew Reilly <areilly@bigpond.net.au>
Cc: Terry Lambert <tlambert@primenet.com>,
	Jonathan Graehl <jonathan@graehl.org>, asmodai@FreeBSD.ORG,
	i18n@FreeBSD.ORG
Subject: Re: Unicode, command line options, and configuration files, oh my!
Message-ID: <20010301095049.A10822@peorth.iteration.net>
Reply-To: "Michael C . Wu" <keichii@peorth.iteration.net>
References: <NCBBLOALCKKINBNNEDDLAELNDLAA.jonathan@graehl.org> <200103010541.WAA17385@usr05.primenet.com> <20010301000207.C4359@peorth.iteration.net> <20010301174513.A65013@gurney.reilly.home>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010301174513.A65013@gurney.reilly.home>; from areilly@bigpond.net.au on Thu, Mar 01, 2001 at 05:45:13PM +1100
X-PGP-Fingerprint: 5025 F691 F943 8128 48A8  5025 77CE 29C5 8FA1 2E20
X-PGP-Key-ID: 0x8FA12E20
Sender: owner-freebsd-i18n@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Thu, Mar 01, 2001 at 05:45:13PM +1100, Andrew Reilly scribbled:
| On Thu, Mar 01, 2001 at 12:02:07AM -0600, Michael C . Wu wrote:
| > Terry wrote:
| > | In general, this means that for Unicode data stored for
| > | directory entries would require that a directory entry
| > | block would have to be 512b, whereas for UTF-8, we are
| > | talking 2048b (2k).
| 
| It would still have to be larger than 512b using a 16-bit
| encoding, wouldn't it?

Yes, and if we are making it larger than 512b, why do we need
to set a limit on ourselves?

| > | If the same approach is used as the current UFS code uses,
| > | then these operations will need to be directory entry block
| > | atomic.
| > 
| > In short, we can save the file name that the user sees 
| > with the file data.  The filesystem and the kernel sees
| > some other naming scheme determined by the FS/kernel.
| 
| How do you propose to do that and still maintain Unix inode/link
| semantics?  There isn't (necessarily) only one file name that
| the user sees, but there _is_ only one lump of file data.

Do you see why nobody has been able to solve all this stuff easily?
I think having a journaling filesystem could solve this.

| > | On top of that, we have Microsoft and Java interoperability to
| > | consider, distasteful as that may be to some.
| > 
| > M$ has a pretty good implementation here.
| > Java I18N sucks really bad.
| 
| Could you give a quick description of why one of these is good
| and the other bad, for the bennefit of someone who knows
| neither?

NTFS gives up the ability to switch charsets in the harddrives.
(It is a pretty good assumption, since most users stay within
two languages.)  And most of the userland tools, even the simple ones,
work with other languages without modifications, when compiled
by Visual Studio.

Java uses a weird scheme to negotiate the contents, where
the server and the client both have to agree in the charset.
Then you have to wrap strings in special functions. Then you
have to specifically tell java that the input is "international" input.
bla bla bla....Generally bad design and a big hassle.
(Have you ever seen a Chinese/Japanese/Korean java-enabled website
 that _works_? I have seen very very few.)

-- 
+------------------------------------------------------------------+
| keichii@peorth.iteration.net         | keichii@bsdconspiracy.net |
| http://peorth.iteration.net/~keichii | Yes, BSD is a conspiracy. |
+------------------------------------------------------------------+

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-i18n" in the body of the message


From owner-freebsd-i18n  Thu Mar  1 13: 0: 8 2001
Delivered-To: freebsd-i18n@freebsd.org
Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9554437B719; Thu,  1 Mar 2001 12:59:58 -0800 (PST)
	(envelope-from tlambert@usr05.primenet.com)
Received: (from daemon@localhost)
	by smtp10.phx.gblx.net (8.9.3/8.9.3) id NAA76472;
	Thu, 1 Mar 2001 13:59:35 -0700
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp10.phx.gblx.net, id smtpdem4Fqa; Thu Mar  1 13:59:24 2001
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id NAA05439;
	Thu, 1 Mar 2001 13:59:43 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200103012059.NAA05439@usr05.primenet.com>
Subject: Re: Unicode, command line options, and configuration files, oh my!
To: areilly@bigpond.net.au (Andrew Reilly)
Date: Thu, 1 Mar 2001 20:59:43 +0000 (GMT)
Cc: keichii@peorth.iteration.net (Michael C . Wu),
	tlambert@primenet.com (Terry Lambert),
	jonathan@graehl.org (Jonathan Graehl),
	freebsd-arch@FreeBSD.ORG (freebsd-Arch), i18n@FreeBSD.ORG
In-Reply-To: <20010301174513.A65013@gurney.reilly.home> from "Andrew Reilly" at Mar 01, 2001 05:45:13 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-i18n@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> > | In general, this means that for Unicode data stored for
> > | directory entries would require that a directory entry
> > | block would have to be 512b, whereas for UTF-8, we are
> > | talking 2048b (2k).
> 
> It would still have to be larger than 512b using a 16-bit
> encoding, wouldn't it?

Yes; 1024b; sorry about that, it was an error.  The point was
supposed to be that, if you go look at the directory entry code,
it would be a lot easier to implement 1k instead of 2k (we did
this before when we ported the FreeBSD VFS to Windows 95 and
supported both the 256 character Unicode and the 8.3 namespaces
simultaneously).


> > | If the same approach is used as the current UFS code uses,
> > | then these operations will need to be directory entry block
> > | atomic.
> > 
> > In short, we can save the file name that the user sees 
> > with the file data.  The filesystem and the kernel sees
> > some other naming scheme determined by the FS/kernel.
> 
> How do you propose to do that and still maintain Unix inode/link
> semantics?  There isn't (necessarily) only one file name that
> the user sees, but there _is_ only one lump of file data.

How do hard links work at all today, under the same conditions?

The directory entry is just a reference to the inode; this is
not like ISO or VFAT, where the directory entry _is_ the inode.


> > | On top of that, we have Microsoft and Java interoperability to
> > | consider, distasteful as that may be to some.
> > 
> > M$ has a pretty good implementation here.
> > Java I18N sucks really bad.
> 
> Could you give a quick description of why one of these is good
> and the other bad, for the bennefit of someone who knows
> neither?

My take on this, which may not be the same as his, is that the
Microsoft implementation uses the processing representation as
the storage representation, whereas Java uses UTF-8 for the
storage representation.

Java also deals in strings composed of "bytes" instead of strings
composed of "characters", which makes string processing problematic,
if the string is an I18N string; consider that it has no functions
similar to XPG/4 mbtowc() or other interning/externing functions
that it would use to deal with them.

It's kind of like the problem with Java letting you instance
objects without a default constructor being required to make
them valid; the JavaMail API is rife with examples of this type
of thing.  You can see it pretty easily, when you try to write
those same interfaces in C++, since C++ doesn't permit that
sort of thing to happen (instancing without initialization is
not possible in C++; there is *always* a default constructor).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-i18n" in the body of the message


From owner-freebsd-i18n  Thu Mar  1 13:15:16 2001
Delivered-To: freebsd-i18n@freebsd.org
Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134])
	by hub.freebsd.org (Postfix) with ESMTP
	id 2208737B71A; Thu,  1 Mar 2001 13:15:12 -0800 (PST)
	(envelope-from tlambert@usr05.primenet.com)
Received: (from daemon@localhost)
	by smtp04.primenet.com (8.9.3/8.9.3) id OAA05290;
	Thu, 1 Mar 2001 14:09:26 -0700 (MST)
Received: from usr05.primenet.com(206.165.6.205)
 via SMTP by smtp04.primenet.com, id smtpdAAAQ9aOik; Thu Mar  1 14:09:10 2001
Received: (from tlambert@localhost)
	by usr05.primenet.com (8.8.5/8.8.5) id OAA06019;
	Thu, 1 Mar 2001 14:14:46 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200103012114.OAA06019@usr05.primenet.com>
Subject: Re: Unicode, command line options, and configuration files, oh my!
To: keichii@peorth.iteration.net
Date: Thu, 1 Mar 2001 21:14:46 +0000 (GMT)
Cc: areilly@bigpond.net.au (Andrew Reilly),
	tlambert@primenet.com (Terry Lambert),
	jonathan@graehl.org (Jonathan Graehl), asmodai@FreeBSD.ORG,
	i18n@FreeBSD.ORG
In-Reply-To: <20010301095049.A10822@peorth.iteration.net> from "Michael C . Wu" at Mar 01, 2001 09:50:49 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-i18n@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

> | > | In general, this means that for Unicode data stored for
> | > | directory entries would require that a directory entry
> | > | block would have to be 512b, whereas for UTF-8, we are
> | > | talking 2048b (2k).
> | 
> | It would still have to be larger than 512b using a 16-bit
> | encoding, wouldn't it?
> 
> Yes, and if we are making it larger than 512b, why do we need
> to set a limit on ourselves?

Directory entry block I/O is not handled through the normal
VFS code.  THis is because the directory entry blocks need to
be modified atomically, and FS blocs can span page boundaries;
for a sufficiently large FS block size, frags can exceed the
page size.  For some architectures, the page size is not := 4k.

You need to look at the UFS directory manipulation code in the
/sys/ufs/ufs directory so that you can uderstand the problem;
while you are at it, look at the fsck and newfs and otherFS
utility code which has to deal with directory entry blocks.

It is not pretty.  It would be nearly imposible to do directory
I/O in FS blocks, and keep it atomic.  There is already the risk
of a 1024b directory entry spanning a track boundary, because we
do not read mode page 2 from SCSI, and prohibit track spanning
by FS objects.


> | How do you propose to do that and still maintain Unix inode/link
> | semantics?  There isn't (necessarily) only one file name that
> | the user sees, but there _is_ only one lump of file data.
> 
> Do you see why nobody has been able to solve all this stuff easily?

Wrong; Matt Day, Mark Muhelestein, and myself solved exactly
this problem in exactly the FreeBSD VFS architecture and exactly
the FreeBSD FFS and UFS code back in 1997.

> I think having a journaling filesystem could solve this.

So can UFS/FFS.  Journalling has nothing to do with the underlying
problem here, which is conversion from a fixed length storage to
a variable length storage, where the underlying media has fixed
length blocks into which you have to map things.

Consider a CDROM FS for music and video, running in a file set
up as a device.  The blocks of such an FS could not be aligned
within a page, since they are odd sized.  How do you mmap() an
object in such an FS?


> NTFS gives up the ability to switch charsets in the harddrives.
> (It is a pretty good assumption, since most users stay within
> two languages.)  And most of the userland tools, even the simple ones,
> work with other languages without modifications, when compiled
> by Visual Studio.

The OLE character tyes are 16 bit.  Some of these interfaces are
not available in all WIN32.DLL implementations.

> Java uses a weird scheme to negotiate the contents, where
> the server and the client both have to agree in the charset.
> Then you have to wrap strings in special functions. Then you
> have to specifically tell java that the input is "international" input.
> bla bla bla....Generally bad design and a big hassle.
> (Have you ever seen a Chinese/Japanese/Korean java-enabled website
>  that _works_? I have seen very very few.)

That's because it considers any I/O to be externalization; that's
a stupid assumption.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-i18n" in the body of the message