From owner-freebsd-hackers@FreeBSD.ORG Thu Feb 27 19:42:18 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C091EE5D for ; Thu, 27 Feb 2014 19:42:18 +0000 (UTC) Received: from mail-wi0-x22b.google.com (mail-wi0-x22b.google.com [IPv6:2a00:1450:400c:c05::22b]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 4EE571CF9 for ; Thu, 27 Feb 2014 19:42:18 +0000 (UTC) Received: by mail-wi0-f171.google.com with SMTP id cc10so7688798wib.10 for ; Thu, 27 Feb 2014 11:42:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=XcP0t9uU2WxIZa9BOKr5lBn2Xf/QpR2ERsDFd5S1sG0=; b=DUvyyAyuTJMym3JDsLFj3Pwg1f+iRtL6QnrzNIQG1/Hdk0tS93G0WHzZynQsW222bG 3YJ+DIo/ht9UCaVD9K0vREJVdKLTezwsILuKZ6v5aXATe0x86o05qZQLUAq2vx6gnE06 cViaIOKFqLhChBIdGIAl3wPlecp42/e9JdfkZNPGVke0uD4/iUtMKhuyFCD4gP+hZcOu sJYpwtEgAxclHZ2Z1AAZmgPG4PUG1+DHiPHI8KIvTNeHj1ND1ZQE0UaCMGayu8wvVN8u xfoxpV/YSHHTBMu4ctB4TzGa9Fn3ltOWD77D0TFhV9jJATQpqGGjjEScQCtGXj6giBOu IV2w== X-Received: by 10.194.110.135 with SMTP id ia7mr9675962wjb.5.1393530136513; Thu, 27 Feb 2014 11:42:16 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.206.68 with HTTP; Thu, 27 Feb 2014 11:41:56 -0800 (PST) In-Reply-To: References: <20140227182641.GE47921@funkthat.com> From: Dmitry Selyutin Date: Thu, 27 Feb 2014 23:41:56 +0400 Message-ID: Subject: Re: GSoC proposal: Quirinus C library (qc) To: hackers@freebsd.org, Jordan Hubbard , =?UTF-8?B?Pz91a2FzeiBXw7NqY2lr?= , =?UTF-8?Q?Fernando_Apestegu=C3=ADa?= , jmg@funkthat.com Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: ghostmansd@gmail.com List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Feb 2014 19:42:18 -0000 Ah, yes, I've misunderstood your question. Yes, on POSIX systems default encoding is taken from setlocale. Then qc_encoding_import converts encoding name to qc_encoding type or leaves it ASCII if it is impossible to find the desired encoding. So conversion from qc_unicode to qc_path is performed using default system encoding. On Windows thinks behave other way though. To summarize: POSIX: qc_path_import_str: copy characters from string to qc_path buffer. qc_path_import_wstr: convert wide string to qc_unicode, decode using locale encoding, copy raw bytes. qc_path_import_bytes: copy raw bytes to qc_path buffer. qc_path_import_unicode: decode using locale encoding, copy raw bytes. Windows: qc_path_import_str: convert to qc_bytes, call qc_path_import_bytes. qc_path_import_wstr: copy wide characters to qc_path buffer. qc_path_import_bytes: decode using UTF-8 encoding, convert to wide string, copy wide string to qc_path buffer. qc_path_import_unicode: convert to wide string, copy wide string char-by-char to qc_path buffer. This is how it works now, though I guess some details may be changed in future. 2014-02-27 22:48 GMT+04:00 Dmitry Selyutin : > Hi John-Mark. > > it seems I've stated things wrong or you've understood me incorrectly. :-) > Path will be in UTF-16 on Windows, otherwise it has pure bytes form, since > paths on POSIX are just sequence of bytes. AFAIK we can use UTF-8 for sure > in OS X though. > So, we have qc_byte for raw bytes (qc_byte == uint8_t), qc_ucs for Unicode > characters (qc_ucs == uint32_t), qc_bytes for raw byte strings, qc_unicode > for Unicode strings. Things get more compicated with paths though. Really > qc_path just stores void* pointer to byte array, which is UTF-16LE sequence > on Windows and raw byte sequence on other platforms. That opens a way to > write a set of platfrom-agnostic functions both on POSIX and Windows. > > > 2014-02-27 22:26 GMT+04:00 John-Mark Gurney : > > Dmitry Selyutin wrote this message on Thu, Feb 27, 2014 at 19:39 +0400: >> > As for strings, I will not use UTF-16 since it provides more problems >> > rather than solutions. If I provide a function which accepts char* or >> char >> > const* argument, I imply that such function uses only ASCII (may be I >> will >> > change ASCII to UTF-8). Encoding is used only if a user has requested it >> > explicitly; the only place where I have made exception is system path >> since >> > path requires to be in UTF-16 on Windows. That is the reason why qc_path >> > requires qc_codecs-related functions. >> >> You do realize that FreeBSD does not enforce any coding on path names >> current, correct? So, requiring a coding format on FreeBSD (UTF-16) >> will mean some paths may not be accessible, since I assume you conver >> the UTF-16 string to UTF-8 before opening on FreeBSD... >> >> Hmm.. maybe it's time for a sysctl you can set on your system that >> only allows you to create UTF-8 valid names to allow people to slowly >> migrate to UTF-8? and a tool to report/convert old non-UTF-8 paths? >> >> -- >> John-Mark Gurney Voice: +1 415 225 5579 >> >> "All that I will do, has been done, All that I have, has not." >> > > > > -- > With best regards, > Dmitry Selyutin > -- With best regards, Dmitry Selyutin