FreeBSD Mail Archives

Date:      Sun, 2 Mar 2014 12:19:23 +0100
From:      =?iso-8859-2?Q?Edward_Tomasz_Napiera=B3a?= <trasz@freebsd.org>
To:        ghostmansd@gmail.com
Cc:        Jordan Hubbard <jkh@turbofuzz.com>, =?iso-8859-2?Q?=3F=3Fukasz_W=F3jcik?= <lukasz.wojcik@zoho.com>, John-Mark Gurney <jmg@funkthat.com>, hackers@freebsd.org, =?iso-8859-2?Q?Fernando_Apestegu=EDa?= <fernando.apesteguia@gmail.com>
Subject:   Re: GSoC proposal: Quirinus C library (qc)
Message-ID:  <486F2F86-940C-44E9-A606-C63C3B607CB1@freebsd.org>
In-Reply-To: <CAMqzjeuLyRpGF3Dh%2BHKjNWN8M2oh-GTMUy9uw=0Y0-2cri=iyg@mail.gmail.com>
References:  <CAMqzjevahZowxWv0gH=Z8jjQdzGsEaA5U_VB-zsLCcwtoWkvxA@mail.gmail.com> <20140227182641.GE47921@funkthat.com> <5A166BC2-D34A-473C-BEFA-9E04760A0AAB@FreeBSD.org> <CAMqzjeuLyRpGF3Dh%2BHKjNWN8M2oh-GTMUy9uw=0Y0-2cri=iyg@mail.gmail.com>

index | next in thread | previous in thread | raw e-mail


Wiadomość napisana przez Dmitry Selyutin w dniu 2 mar 2014, o godz. 11:10:
> Hi Edward,
> 
> there is no such thing as different UTF-8 encodings. If you talk about e.g. accents and diacritics representation, actually there are normalization forms which apply to UCS points rather than to UTF-8 byte sequences. If you mean the fact that the same UCS-4 code point can be represented as different byte sequence, only the shortest form is permitted.

Right, normalization forms, that's what it's called.  Still, there are three or four
of them, and I seem to remember OS X uses different one from opensource world;
that' s how I learned about them in the first place: by moving files from Mac
to FreeBSD and then trying to figure out why the shell autocompletion doesn't
work for them.

> Honestly I think that UTF-8 is the only encoding that has right to live. Other encodings seem to die or to be dead already.

True that.

> С уважением,
> Дмитрий Селютин
> 
> 02.03.2014 13:54 пользователь "Edward Tomasz Napierała" <trasz@freebsd.org> написал:
> Wiadomość napisana przez John-Mark Gurney w dniu 27 lut 2014, o godz. 19:26:
> > Dmitry Selyutin wrote this message on Thu, Feb 27, 2014 at 19:39 +0400:
> >> As for strings, I will not use UTF-16 since it provides more problems
> >> rather than solutions. If I provide a function which accepts char* or char
> >> const* argument, I imply that such function uses only ASCII (may be I will
> >> change ASCII to UTF-8). Encoding is used only if a user has requested it
> >> explicitly; the only place where I have made exception is system path since
> >> path requires to be in UTF-16 on Windows. That is the reason why qc_path
> >> requires qc_codecs-related functions.
> >
> > You do realize that FreeBSD does not enforce any coding on path names
> > current, correct?  So, requiring a coding format on FreeBSD (UTF-16)
> > will mean some paths may not be accessible, since I assume you conver
> > the UTF-16 string to UTF-8 before opening on FreeBSD...
> >
> > Hmm.. maybe it's time for a sysctl you can set on your system that
> > only allows you to create UTF-8 valid names to allow people to slowly
> > migrate to UTF-8?  and a tool to report/convert old non-UTF-8 paths?
> 
> There's already a ZFS property ("utfmode") exactly for this purpose.
> 
> Actually, its funnier than that: because the kernel doesn't know anything
> about UTF-8, one can create several files with the same name, but with
> different UTF-8 encodings.  And there is ZFS property to fix this problem
> as well ("normalization").
>

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?486F2F86-940C-44E9-A606-C63C3B607CB1>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation