Date: Thu, 27 Feb 2014 23:41:56 +0400 From: Dmitry Selyutin <ghostman.sd@gmail.com> To: hackers@freebsd.org, Jordan Hubbard <jkh@turbofuzz.com>, =?UTF-8?B?Pz91a2FzeiBXw7NqY2lr?= <lukasz.wojcik@zoho.com>, =?UTF-8?Q?Fernando_Apestegu=C3=ADa?= <fernando.apesteguia@gmail.com>, jmg@funkthat.com Subject: Re: GSoC proposal: Quirinus C library (qc) Message-ID: <CAMqzjetDEZJ21jFoTOrrM%2BY1TygjfEq-PCQ14EHAmhBcwnww%2Bw@mail.gmail.com> In-Reply-To: <CAMqzjevCpPe66nuW%2BDZPQ0xWnHLQv==vwhuRwLHnrfCyJe96ew@mail.gmail.com> References: <CAMqzjevahZowxWv0gH=Z8jjQdzGsEaA5U_VB-zsLCcwtoWkvxA@mail.gmail.com> <20140227182641.GE47921@funkthat.com> <CAMqzjevCpPe66nuW%2BDZPQ0xWnHLQv==vwhuRwLHnrfCyJe96ew@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Ah, yes, I've misunderstood your question. Yes, on POSIX systems default encoding is taken from setlocale. Then qc_encoding_import converts encoding name to qc_encoding type or leaves it ASCII if it is impossible to find the desired encoding. So conversion from qc_unicode to qc_path is performed using default system encoding. On Windows thinks behave other way though. To summarize: POSIX: qc_path_import_str: copy characters from string to qc_path buffer. qc_path_import_wstr: convert wide string to qc_unicode, decode using locale encoding, copy raw bytes. qc_path_import_bytes: copy raw bytes to qc_path buffer. qc_path_import_unicode: decode using locale encoding, copy raw bytes. Windows: qc_path_import_str: convert to qc_bytes, call qc_path_import_bytes. qc_path_import_wstr: copy wide characters to qc_path buffer. qc_path_import_bytes: decode using UTF-8 encoding, convert to wide string, copy wide string to qc_path buffer. qc_path_import_unicode: convert to wide string, copy wide string char-by-char to qc_path buffer. This is how it works now, though I guess some details may be changed in future. 2014-02-27 22:48 GMT+04:00 Dmitry Selyutin <ghostman.sd@gmail.com>: > Hi John-Mark. > > it seems I've stated things wrong or you've understood me incorrectly. :-) > Path will be in UTF-16 on Windows, otherwise it has pure bytes form, since > paths on POSIX are just sequence of bytes. AFAIK we can use UTF-8 for sure > in OS X though. > So, we have qc_byte for raw bytes (qc_byte == uint8_t), qc_ucs for Unicode > characters (qc_ucs == uint32_t), qc_bytes for raw byte strings, qc_unicode > for Unicode strings. Things get more compicated with paths though. Really > qc_path just stores void* pointer to byte array, which is UTF-16LE sequence > on Windows and raw byte sequence on other platforms. That opens a way to > write a set of platfrom-agnostic functions both on POSIX and Windows. > > > 2014-02-27 22:26 GMT+04:00 John-Mark Gurney <jmg@funkthat.com>: > > Dmitry Selyutin wrote this message on Thu, Feb 27, 2014 at 19:39 +0400: >> > As for strings, I will not use UTF-16 since it provides more problems >> > rather than solutions. If I provide a function which accepts char* or >> char >> > const* argument, I imply that such function uses only ASCII (may be I >> will >> > change ASCII to UTF-8). Encoding is used only if a user has requested it >> > explicitly; the only place where I have made exception is system path >> since >> > path requires to be in UTF-16 on Windows. That is the reason why qc_path >> > requires qc_codecs-related functions. >> >> You do realize that FreeBSD does not enforce any coding on path names >> current, correct? So, requiring a coding format on FreeBSD (UTF-16) >> will mean some paths may not be accessible, since I assume you conver >> the UTF-16 string to UTF-8 before opening on FreeBSD... >> >> Hmm.. maybe it's time for a sysctl you can set on your system that >> only allows you to create UTF-8 valid names to allow people to slowly >> migrate to UTF-8? and a tool to report/convert old non-UTF-8 paths? >> >> -- >> John-Mark Gurney Voice: +1 415 225 5579 >> >> "All that I will do, has been done, All that I have, has not." >> > > > > -- > With best regards, > Dmitry Selyutin > -- With best regards, Dmitry Selyutin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMqzjetDEZJ21jFoTOrrM%2BY1TygjfEq-PCQ14EHAmhBcwnww%2Bw>