FreeBSD Mail Archives

Date:      Sun, 2 Mar 2014 12:19:23 +0100
From:      =?iso-8859-2?Q?Edward_Tomasz_Napiera=B3a?= <trasz@freebsd.org>
To:        ghostmansd@gmail.com
Cc:        Jordan Hubbard <jkh@turbofuzz.com>, =?iso-8859-2?Q?=3F=3Fukasz_W=F3jcik?= <lukasz.wojcik@zoho.com>, John-Mark Gurney <jmg@funkthat.com>, hackers@freebsd.org, =?iso-8859-2?Q?Fernando_Apestegu=EDa?= <fernando.apesteguia@gmail.com>
Subject:   Re: GSoC proposal: Quirinus C library (qc)
Message-ID:  <486F2F86-940C-44E9-A606-C63C3B607CB1@freebsd.org>
In-Reply-To: <CAMqzjeuLyRpGF3Dh%2BHKjNWN8M2oh-GTMUy9uw=0Y0-2cri=iyg@mail.gmail.com>
References:  <CAMqzjevahZowxWv0gH=Z8jjQdzGsEaA5U_VB-zsLCcwtoWkvxA@mail.gmail.com> <20140227182641.GE47921@funkthat.com> <5A166BC2-D34A-473C-BEFA-9E04760A0AAB@FreeBSD.org> <CAMqzjeuLyRpGF3Dh%2BHKjNWN8M2oh-GTMUy9uw=0Y0-2cri=iyg@mail.gmail.com>

Wiadomo=C5=9B=C4=87 napisana przez Dmitry Selyutin w dniu 2 mar 2014, o =
godz. 11:10:
> Hi Edward,
>=20
> there is no such thing as different UTF-8 encodings. If you talk about =
e.g. accents and diacritics representation, actually there are =
normalization forms which apply to UCS points rather than to UTF-8 byte =
sequences. If you mean the fact that the same UCS-4 code point can be =
represented as different byte sequence, only the shortest form is =
permitted.

Right, normalization forms, that's what it's called.  Still, there are =
three or four
of them, and I seem to remember OS X uses different one from opensource =
world;
that' s how I learned about them in the first place: by moving files =
from Mac
to FreeBSD and then trying to figure out why the shell autocompletion =
doesn't
work for them.

> Honestly I think that UTF-8 is the only encoding that has right to =
live. Other encodings seem to die or to be dead already.

True that.

> =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC,
> =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D0=B9 =D0=A1=D0=B5=D0=BB=D1=8E=D1=82=
=D0=B8=D0=BD
>=20
> 02.03.2014 13:54 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0=
=B5=D0=BB=D1=8C "Edward Tomasz Napiera=C5=82a" <trasz@freebsd.org> =
=D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB:
> Wiadomo=C5=9B=C4=87 napisana przez John-Mark Gurney w dniu 27 lut =
2014, o godz. 19:26:
> > Dmitry Selyutin wrote this message on Thu, Feb 27, 2014 at 19:39 =
+0400:
> >> As for strings, I will not use UTF-16 since it provides more =
problems
> >> rather than solutions. If I provide a function which accepts char* =
or char
> >> const* argument, I imply that such function uses only ASCII (may be =
I will
> >> change ASCII to UTF-8). Encoding is used only if a user has =
requested it
> >> explicitly; the only place where I have made exception is system =
path since
> >> path requires to be in UTF-16 on Windows. That is the reason why =
qc_path
> >> requires qc_codecs-related functions.
> >
> > You do realize that FreeBSD does not enforce any coding on path =
names
> > current, correct?  So, requiring a coding format on FreeBSD (UTF-16)
> > will mean some paths may not be accessible, since I assume you =
conver
> > the UTF-16 string to UTF-8 before opening on FreeBSD...
> >
> > Hmm.. maybe it's time for a sysctl you can set on your system that
> > only allows you to create UTF-8 valid names to allow people to =
slowly
> > migrate to UTF-8?  and a tool to report/convert old non-UTF-8 paths?
>=20
> There's already a ZFS property ("utfmode") exactly for this purpose.
>=20
> Actually, its funnier than that: because the kernel doesn't know =
anything
> about UTF-8, one can create several files with the same name, but with
> different UTF-8 encodings.  And there is ZFS property to fix this =
problem
> as well ("normalization").
>=20

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?486F2F86-940C-44E9-A606-C63C3B607CB1>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation