From owner-freebsd-hackers@FreeBSD.ORG Sun Mar 2 11:19:29 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 50608298 for ; Sun, 2 Mar 2014 11:19:29 +0000 (UTC) Received: from mail-ee0-x233.google.com (mail-ee0-x233.google.com [IPv6:2a00:1450:4013:c00::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D4269137F for ; Sun, 2 Mar 2014 11:19:28 +0000 (UTC) Received: by mail-ee0-f51.google.com with SMTP id c13so1367311eek.38 for ; Sun, 02 Mar 2014 03:19:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ebc7QzIzvuYnnQ56Hw2/XgaO2L+3/Z5SZ5326pDveJI=; b=0qTMEz4+YEADgMG86UtuOrDYNa0CyHX05MbcRrUNy2EOk2y/qAHhuWFk5zEtlslwAe 9s8Pcm5kGAT3Hs+Zwi1FXTGinPuwG7VRtLNw7zoqxkqVThpb8IMlxDDQ+DaDzOZ5ixlE IWftXZXotufdMJ2ODs1npIhfOYHJp18DoqnifJltxBwZImCL5HMxpoiK8/OBJGPYfdtW ae6UoS7TaEbE2t2DjbHoo7TPoVZw7ETwRdQiw3bD+8TRq5vFtEwgr3lH631LVb9P4YA2 22i6OZyJKRHWg06CXpSCsWEpU4GjPcGnNhi8gjb7+PJ0iH/tEBvEyoEiOHtipkg4R1x5 8Gfw== X-Received: by 10.15.73.134 with SMTP id h6mr33243254eey.15.1393759167272; Sun, 02 Mar 2014 03:19:27 -0800 (PST) Received: from strashydlo.home (adha144.neoplus.adsl.tpnet.pl. [79.184.156.144]) by mx.google.com with ESMTPSA id l4sm35339844eeo.9.2014.03.02.03.19.25 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 02 Mar 2014 03:19:26 -0800 (PST) Sender: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Subject: Re: GSoC proposal: Quirinus C library (qc) Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=utf-8 From: =?iso-8859-2?Q?Edward_Tomasz_Napiera=B3a?= In-Reply-To: Date: Sun, 2 Mar 2014 12:19:23 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <486F2F86-940C-44E9-A606-C63C3B607CB1@freebsd.org> References: <20140227182641.GE47921@funkthat.com> <5A166BC2-D34A-473C-BEFA-9E04760A0AAB@FreeBSD.org> To: ghostmansd@gmail.com X-Mailer: Apple Mail (2.1283) Cc: Jordan Hubbard , =?iso-8859-2?Q?=3F=3Fukasz_W=F3jcik?= , John-Mark Gurney , hackers@freebsd.org, =?iso-8859-2?Q?Fernando_Apestegu=EDa?= X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Mar 2014 11:19:29 -0000 Wiadomo=C5=9B=C4=87 napisana przez Dmitry Selyutin w dniu 2 mar 2014, o = godz. 11:10: > Hi Edward, >=20 > there is no such thing as different UTF-8 encodings. If you talk about = e.g. accents and diacritics representation, actually there are = normalization forms which apply to UCS points rather than to UTF-8 byte = sequences. If you mean the fact that the same UCS-4 code point can be = represented as different byte sequence, only the shortest form is = permitted. Right, normalization forms, that's what it's called. Still, there are = three or four of them, and I seem to remember OS X uses different one from opensource = world; that' s how I learned about them in the first place: by moving files = from Mac to FreeBSD and then trying to figure out why the shell autocompletion = doesn't work for them. > Honestly I think that UTF-8 is the only encoding that has right to = live. Other encodings seem to die or to be dead already. True that. > =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, > =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D0=B9 =D0=A1=D0=B5=D0=BB=D1=8E=D1=82= =D0=B8=D0=BD >=20 > 02.03.2014 13:54 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0= =B5=D0=BB=D1=8C "Edward Tomasz Napiera=C5=82a" = =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB: > Wiadomo=C5=9B=C4=87 napisana przez John-Mark Gurney w dniu 27 lut = 2014, o godz. 19:26: > > Dmitry Selyutin wrote this message on Thu, Feb 27, 2014 at 19:39 = +0400: > >> As for strings, I will not use UTF-16 since it provides more = problems > >> rather than solutions. If I provide a function which accepts char* = or char > >> const* argument, I imply that such function uses only ASCII (may be = I will > >> change ASCII to UTF-8). Encoding is used only if a user has = requested it > >> explicitly; the only place where I have made exception is system = path since > >> path requires to be in UTF-16 on Windows. That is the reason why = qc_path > >> requires qc_codecs-related functions. > > > > You do realize that FreeBSD does not enforce any coding on path = names > > current, correct? So, requiring a coding format on FreeBSD (UTF-16) > > will mean some paths may not be accessible, since I assume you = conver > > the UTF-16 string to UTF-8 before opening on FreeBSD... > > > > Hmm.. maybe it's time for a sysctl you can set on your system that > > only allows you to create UTF-8 valid names to allow people to = slowly > > migrate to UTF-8? and a tool to report/convert old non-UTF-8 paths? >=20 > There's already a ZFS property ("utfmode") exactly for this purpose. >=20 > Actually, its funnier than that: because the kernel doesn't know = anything > about UTF-8, one can create several files with the same name, but with > different UTF-8 encodings. And there is ZFS property to fix this = problem > as well ("normalization"). >=20