From owner-freebsd-hackers@FreeBSD.ORG Sun Mar 2 10:10:18 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 61AD8659; Sun, 2 Mar 2014 10:10:18 +0000 (UTC) Received: from mail-we0-x230.google.com (mail-we0-x230.google.com [IPv6:2a00:1450:400c:c03::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C08FB14FA; Sun, 2 Mar 2014 10:10:17 +0000 (UTC) Received: by mail-we0-f176.google.com with SMTP id x48so2008747wes.35 for ; Sun, 02 Mar 2014 02:10:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=jcns0OZTJP88CQN6LXGMa+uHSfWhzkOXYY9t4kM7h+E=; b=B+ylk7pXx9RAKKe4D6QajtjTmBFpzVvklJxHdHd6r2uT6y0+zKf5iZwzkyZtI7O6RY xPMaWDtSU2ZJQ3fs3TmIHxxJDC6GedQMgvdqzUcOv3G0xfAeTMtWTBTSb8DIX/+biMFs ELiK5Y9xcp4U4s7RaL1/Weafhe7hwyNWcqV3dx1kdQp2wfZ2BXRCvI28O7kBYxydCv5M CIXOHAa/ewFhxYK+K67/Q9Q425m89h5PLqb/DjCYQ+TVUQ4sX0z0Y83bGkNM5nbBQQ6v DFlai3tcfgHt91qcGgTYWmTJBrZg8v8uGckoAWr8MN3oLK4W4l78PMKYUb0qNfME8eNB IH9w== MIME-Version: 1.0 X-Received: by 10.180.37.162 with SMTP id z2mr9649575wij.51.1393755016128; Sun, 02 Mar 2014 02:10:16 -0800 (PST) Received: by 10.194.206.68 with HTTP; Sun, 2 Mar 2014 02:10:16 -0800 (PST) Received: by 10.194.206.68 with HTTP; Sun, 2 Mar 2014 02:10:16 -0800 (PST) In-Reply-To: <5A166BC2-D34A-473C-BEFA-9E04760A0AAB@FreeBSD.org> References: <20140227182641.GE47921@funkthat.com> <5A166BC2-D34A-473C-BEFA-9E04760A0AAB@FreeBSD.org> Date: Sun, 2 Mar 2014 14:10:16 +0400 Message-ID: Subject: Re: GSoC proposal: Quirinus C library (qc) From: Dmitry Selyutin To: =?UTF-8?Q?Edward_Tomasz_Napiera=C5=82a?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: Jordan Hubbard , John-Mark Gurney , hackers@freebsd.org, =?UTF-8?Q?Fernando_Apestegu=C3=ADa?= , =?UTF-8?B?Pz91a2FzeiBXw7NqY2lr?= , ghostmansd@gmail.com X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: ghostmansd@gmail.com List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Mar 2014 10:10:18 -0000 Hi Edward, there is no such thing as different UTF-8 encodings. If you talk about e.g. accents and diacritics representation, actually there are normalization forms which apply to UCS points rather than to UTF-8 byte sequences. If you mean the fact that the same UCS-4 code point can be represented as different byte sequence, only the shortest form is permitted. Honestly I think that UTF-8 is the only encoding that has right to live. Other encodings seem to die or to be dead already. =D0=A1 =D1=83=D0=B2=D0=B0=D0=B6=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC, =D0=94=D0=BC=D0=B8=D1=82=D1=80=D0=B8=D0=B9 =D0=A1=D0=B5=D0=BB=D1=8E=D1=82= =D0=B8=D0=BD 02.03.2014 13:54 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0= =B5=D0=BB=D1=8C "Edward Tomasz Napiera=C5=82a" =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB: > Wiadomo=C5=9B=C4=87 napisana przez John-Mark Gurney w dniu 27 lut 2014, o= godz. > 19:26: > > Dmitry Selyutin wrote this message on Thu, Feb 27, 2014 at 19:39 +0400: > >> As for strings, I will not use UTF-16 since it provides more problems > >> rather than solutions. If I provide a function which accepts char* or > char > >> const* argument, I imply that such function uses only ASCII (may be I > will > >> change ASCII to UTF-8). Encoding is used only if a user has requested = it > >> explicitly; the only place where I have made exception is system path > since > >> path requires to be in UTF-16 on Windows. That is the reason why qc_pa= th > >> requires qc_codecs-related functions. > > > > You do realize that FreeBSD does not enforce any coding on path names > > current, correct? So, requiring a coding format on FreeBSD (UTF-16) > > will mean some paths may not be accessible, since I assume you conver > > the UTF-16 string to UTF-8 before opening on FreeBSD... > > > > Hmm.. maybe it's time for a sysctl you can set on your system that > > only allows you to create UTF-8 valid names to allow people to slowly > > migrate to UTF-8? and a tool to report/convert old non-UTF-8 paths? > > There's already a ZFS property ("utfmode") exactly for this purpose. > > Actually, its funnier than that: because the kernel doesn't know anything > about UTF-8, one can create several files with the same name, but with > different UTF-8 encodings. And there is ZFS property to fix this problem > as well ("normalization"). > >