Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Feb 2014 23:41:56 +0400
From:      Dmitry Selyutin <ghostman.sd@gmail.com>
To:        hackers@freebsd.org, Jordan Hubbard <jkh@turbofuzz.com>,  =?UTF-8?B?Pz91a2FzeiBXw7NqY2lr?= <lukasz.wojcik@zoho.com>,  =?UTF-8?Q?Fernando_Apestegu=C3=ADa?= <fernando.apesteguia@gmail.com>,  jmg@funkthat.com
Subject:   Re: GSoC proposal: Quirinus C library (qc)
Message-ID:  <CAMqzjetDEZJ21jFoTOrrM%2BY1TygjfEq-PCQ14EHAmhBcwnww%2Bw@mail.gmail.com>
In-Reply-To: <CAMqzjevCpPe66nuW%2BDZPQ0xWnHLQv==vwhuRwLHnrfCyJe96ew@mail.gmail.com>
References:  <CAMqzjevahZowxWv0gH=Z8jjQdzGsEaA5U_VB-zsLCcwtoWkvxA@mail.gmail.com> <20140227182641.GE47921@funkthat.com> <CAMqzjevCpPe66nuW%2BDZPQ0xWnHLQv==vwhuRwLHnrfCyJe96ew@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Ah, yes, I've misunderstood your question. Yes, on POSIX systems default
encoding is taken from setlocale. Then qc_encoding_import converts encoding
name to qc_encoding type or leaves it ASCII if it is impossible to find the
desired encoding. So conversion from qc_unicode to qc_path is performed
using default system encoding. On Windows thinks behave other way though.
To summarize:

POSIX:
qc_path_import_str: copy characters from string to qc_path buffer.
qc_path_import_wstr: convert wide string to qc_unicode, decode using locale
encoding, copy raw bytes.
qc_path_import_bytes: copy raw bytes to qc_path buffer.
qc_path_import_unicode: decode using locale encoding, copy raw bytes.

Windows:
qc_path_import_str: convert to qc_bytes, call qc_path_import_bytes.
qc_path_import_wstr: copy wide characters to qc_path buffer.
qc_path_import_bytes: decode using UTF-8 encoding, convert to wide string,
copy wide string to qc_path buffer.
qc_path_import_unicode: convert to wide string, copy wide string
char-by-char to qc_path buffer.

This is how it works now, though I guess some details may be changed in
future.


2014-02-27 22:48 GMT+04:00 Dmitry Selyutin <ghostman.sd@gmail.com>:

> Hi John-Mark.
>
> it seems I've stated things wrong or you've understood me incorrectly. :-)
> Path will be in UTF-16 on Windows, otherwise it has pure bytes form, since
> paths on POSIX are just sequence of bytes. AFAIK we can use UTF-8 for sure
> in OS X though.
> So, we have qc_byte for raw bytes (qc_byte == uint8_t), qc_ucs for Unicode
> characters (qc_ucs == uint32_t), qc_bytes for raw byte strings, qc_unicode
> for Unicode strings. Things get more compicated with paths though. Really
> qc_path just stores void* pointer to byte array, which is UTF-16LE sequence
> on Windows and raw byte sequence on other platforms. That opens a way to
> write a set of platfrom-agnostic functions both on POSIX and Windows.
>
>
> 2014-02-27 22:26 GMT+04:00 John-Mark Gurney <jmg@funkthat.com>:
>
> Dmitry Selyutin wrote this message on Thu, Feb 27, 2014 at 19:39 +0400:
>> > As for strings, I will not use UTF-16 since it provides more problems
>> > rather than solutions. If I provide a function which accepts char* or
>> char
>> > const* argument, I imply that such function uses only ASCII (may be I
>> will
>> > change ASCII to UTF-8). Encoding is used only if a user has requested it
>> > explicitly; the only place where I have made exception is system path
>> since
>> > path requires to be in UTF-16 on Windows. That is the reason why qc_path
>> > requires qc_codecs-related functions.
>>
>> You do realize that FreeBSD does not enforce any coding on path names
>> current, correct?  So, requiring a coding format on FreeBSD (UTF-16)
>> will mean some paths may not be accessible, since I assume you conver
>> the UTF-16 string to UTF-8 before opening on FreeBSD...
>>
>> Hmm.. maybe it's time for a sysctl you can set on your system that
>> only allows you to create UTF-8 valid names to allow people to slowly
>> migrate to UTF-8?  and a tool to report/convert old non-UTF-8 paths?
>>
>> --
>>   John-Mark Gurney                              Voice: +1 415 225 5579
>>
>>      "All that I will do, has been done, All that I have, has not."
>>
>
>
>
> --
> With best regards,
> Dmitry Selyutin
>



-- 
With best regards,
Dmitry Selyutin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMqzjetDEZJ21jFoTOrrM%2BY1TygjfEq-PCQ14EHAmhBcwnww%2Bw>