Date: Fri, 01 Sep 2000 18:21:58 +0100 From: Konstantin Chuguev <Konstantin.Chuguev@dante.org.uk> To: "Andrey A. Chernov" <ache@nagual.pp.ru> Cc: Boris Popov <bp@butya.kz>, freebsd-arch@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG Subject: Re: Proposal to include iconv library in the base system. Message-ID: <39AFE5B6.1F418EDD@dante.org.uk> References: <Pine.BSF.4.10.10008241719320.80086-100000@lion.butya.kz> <20000901185945.A29804@nagual.pp.ru> <39AFD666.880FE6C@dante.org.uk> <20000901205825.A30569@nagual.pp.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
"Andrey A. Chernov" wrote: > On Fri, Sep 01, 2000 at 05:16:38PM +0100, Konstantin Chuguev wrote: > > * new filesystems use Unicode encodings: UCS-2 (Windows), some may use > > UTF-8. These encodings are not supported by XLAT. > > I assume Windows (Unicode) <-> 8bit charset tables are loadable > too. Doesn't? > Yes, they are. iconv always loads 2 CES modules for conversion. A CES module can load 0 or more CCS modules. Let me show you a few examples: If we are converting from koi8-r to UCS-2 and/or the other way around, there will be 3 modules loaded: (CES) _tbl_simple -> (CCS) koi8-r (CES) ucs-2 For conversion between koi8-r and windows-1251, there will be 3 modules again: (CES) _tbl_simple -> (CCS) koi8-r (CES) _tbl_simple -> (CCS) windows-1251 (Note, that only one instance of the _tbl_simple module will be loaded, as modules are shareable; there might be two different small sets of structures allocated for each CES->CCS binding at iconv_open time, and freed at iconv_close time.) For conversion between UTF-8 and EUC-JP 6 modules are required: (CES) utf-8 (CES) euc-jp -> (CCS) us-ascii -> (CCS) jis_x0208-1983 -> (CCS) jis_x0201 -> (CCS) jis_x0212-1990 To convert between EUC-JP and ISO-2022-JP we need 6 modules: (CES) euc-jp -> (CCS) us-ascii -> (CCS) jis_x0208-1983 -> (CCS) jis_x0201 -> (CCS) jis_x0212-1990 (CES) iso-2022-jp -> (CCS) us-ascii -> (CCS) jis_x0208-1983 -> (CCS) jis_x0201 -> (CCS) jis_x0212-1990 Again, all CCS modules will be shared. > > > Exactly, this is what was intended. All [UNIX] charsets supported in the > > FreeBSD distribution (i.e. which are present in the locale directory) PLUS > > charsets used in other types of filesystems (Windows, Netware?, MacOS?) for > > Currently we support Windows and ISO 9660 for CDs, so PLUS Windows > (Unicode) and ISO 9660 charsets. > > If we take Russian example, we need following tables (for kernel only): > > 1) KOI8-R <-> CP866 for MSDOS FS > 2) KOI8-R <-> Unicode for Windows FS What about Windows < 95? I'm sure people used localized file names there too. And it was not Unicode. > > 3) We also need ISO 9660 conversion scheme, but I not know about > character set used there. > Can anybody please give me a reference to ISO 9660 specification. I would also like to know which IBM charsets are used in MSDOS FS for languages other than Russian (and supported in FreeBSD). -- * * Konstantin Chuguev - Application Engineer * * Francis House, 112 Hills Road * Cambridge CB2 1PQ, United Kingdom D A N T E WWW: http://www.dante.net To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-i18n" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39AFE5B6.1F418EDD>