From owner-freebsd-arch Fri Sep 1 10:22:28 2000 Delivered-To: freebsd-arch@freebsd.org Received: from alpha.dante.org.uk (alpha.dante.org.uk [193.63.211.19]) by hub.freebsd.org (Postfix) with ESMTP id 01BE937B424; Fri, 1 Sep 2000 10:22:23 -0700 (PDT) Received: from theta.dante.org.uk ([193.63.211.7]) by alpha.dante.org.uk with esmtp (Exim 3.12 #4) id 13UuVy-0006f7-00; Fri, 01 Sep 2000 18:22:06 +0100 Received: from localhost ([127.0.0.1] helo=dante.org.uk) by theta.dante.org.uk with esmtp (Exim 3.12 #4) id 13UuVq-00074M-00; Fri, 01 Sep 2000 18:21:58 +0100 Message-ID: <39AFE5B6.1F418EDD@dante.org.uk> Date: Fri, 01 Sep 2000 18:21:58 +0100 From: Konstantin Chuguev Organization: Delivery of Advanced Networking Service to Europe Ltd. X-Mailer: Mozilla 4.75 [en] (X11; U; SunOS 5.6 sun4u) X-Accept-Language: en, ru MIME-Version: 1.0 To: "Andrey A. Chernov" Cc: Boris Popov , freebsd-arch@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG Subject: Re: Proposal to include iconv library in the base system. References: <20000901185945.A29804@nagual.pp.ru> <39AFD666.880FE6C@dante.org.uk> <20000901205825.A30569@nagual.pp.ru> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG "Andrey A. Chernov" wrote: > On Fri, Sep 01, 2000 at 05:16:38PM +0100, Konstantin Chuguev wrote: > > * new filesystems use Unicode encodings: UCS-2 (Windows), some may use > > UTF-8. These encodings are not supported by XLAT. > > I assume Windows (Unicode) <-> 8bit charset tables are loadable > too. Doesn't? > Yes, they are. iconv always loads 2 CES modules for conversion. A CES module can load 0 or more CCS modules. Let me show you a few examples: If we are converting from koi8-r to UCS-2 and/or the other way around, there will be 3 modules loaded: (CES) _tbl_simple -> (CCS) koi8-r (CES) ucs-2 For conversion between koi8-r and windows-1251, there will be 3 modules again: (CES) _tbl_simple -> (CCS) koi8-r (CES) _tbl_simple -> (CCS) windows-1251 (Note, that only one instance of the _tbl_simple module will be loaded, as modules are shareable; there might be two different small sets of structures allocated for each CES->CCS binding at iconv_open time, and freed at iconv_close time.) For conversion between UTF-8 and EUC-JP 6 modules are required: (CES) utf-8 (CES) euc-jp -> (CCS) us-ascii -> (CCS) jis_x0208-1983 -> (CCS) jis_x0201 -> (CCS) jis_x0212-1990 To convert between EUC-JP and ISO-2022-JP we need 6 modules: (CES) euc-jp -> (CCS) us-ascii -> (CCS) jis_x0208-1983 -> (CCS) jis_x0201 -> (CCS) jis_x0212-1990 (CES) iso-2022-jp -> (CCS) us-ascii -> (CCS) jis_x0208-1983 -> (CCS) jis_x0201 -> (CCS) jis_x0212-1990 Again, all CCS modules will be shared. > > > Exactly, this is what was intended. All [UNIX] charsets supported in the > > FreeBSD distribution (i.e. which are present in the locale directory) PLUS > > charsets used in other types of filesystems (Windows, Netware?, MacOS?) for > > Currently we support Windows and ISO 9660 for CDs, so PLUS Windows > (Unicode) and ISO 9660 charsets. > > If we take Russian example, we need following tables (for kernel only): > > 1) KOI8-R <-> CP866 for MSDOS FS > 2) KOI8-R <-> Unicode for Windows FS What about Windows < 95? I'm sure people used localized file names there too. And it was not Unicode. > > 3) We also need ISO 9660 conversion scheme, but I not know about > character set used there. > Can anybody please give me a reference to ISO 9660 specification. I would also like to know which IBM charsets are used in MSDOS FS for languages other than Russian (and supported in FreeBSD). -- * * Konstantin Chuguev - Application Engineer * * Francis House, 112 Hills Road * Cambridge CB2 1PQ, United Kingdom D A N T E WWW: http://www.dante.net To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message