From owner-freebsd-current Wed Dec 12 3:30:42 2001 Delivered-To: freebsd-current@freebsd.org Received: from pintail.mail.pas.earthlink.net (pintail.mail.pas.earthlink.net [207.217.120.122]) by hub.freebsd.org (Postfix) with ESMTP id 75C9837B405; Wed, 12 Dec 2001 03:30:33 -0800 (PST) Received: from pool0012.cvx22-bradley.dialup.earthlink.net ([209.179.198.12] helo=mindspring.com) by pintail.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 16E7ap-0001Dd-00; Wed, 12 Dec 2001 03:30:32 -0800 Message-ID: <3C173FDD.5CB96DE4@mindspring.com> Date: Wed, 12 Dec 2001 03:30:37 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Maxim Sobolev Cc: Liu Siwei , current@FreeBSD.org Subject: Re: Hi,All References: <3C170F56.435FA023@FreeBSD.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Maxim Sobolev wrote: > Liu Siwei wrote: > > I love FreeBSD! But.. Can it support CD-RW disc and Simplie Chinese > > Filename? A lot of files in CD-ROM that have Chinese name, how can i open it > > under FreeBSD? Oh...Oh.... > > What is the official name for Simplie Chinese codepage? If it is a > 1-byte charset, then I could probably add support for it into > cd9660_unicode ports, which would allow accessing files with such > filenames on them. The most common character sets for Chinese are: GB-2312 Simplified Chinese EUC-TW Traditional Chinese Big-5 Traditional Chinese The one in most common use is Big-5. Unicode supports Chinese through its CJK unification, and can have characters in it round-tripped into any of the above character set standards. All of these are multibyte character sets. Unfortunately, UTF-7 and UTF-8 tend to be used with Unicode, which destroys fixed field storage of data (since any character can take up to 5 bytes to store, depending on its code point, when UTF encoding is used). The answer to the original question is "it depends on how the Chinese character data is stored on the CDROM". If the storage is as multibyte, then decoding it is the job of the rendering engine. In other words, you leave it alone, and use a Chinese display program and input method for X Windows, and it will "just work". If the storage is as Unicode code points, then, since tty interfaces are currently single byte, then you would need to have a converter program between the FS and the directory code, to convert it to multibyte, so that when you list the directory, you get the multibyte values out, and that, in turn, is rendered by the Chinese capable multibyte program (xterm/etc.). Right now, FreeBSD does not convert to/from Unicode 2/4 byte encoding (Windows uses 2 byte encoding, as does Joliet, the Windows CDROM standard, which *is* supported by FreeBSD); it merely masks off the high byte of the two bytes, taking advantage of the fact that the first 256 bytes of Unicode is identical to ISO 8859-1 (Latin-1). You would need to be able to throw down round trip tables (probably via an ioctl() to load them) to the kernel (this is what Windows does). Note that because of the expansion requirements, it's possible to have 256 Unicode stored Chinese characters bloat to 1280 characters, which exceeds both the maximum file name component length (256) and path name length (1024) set by UNIX (and copied by FreeBSD). It's highly unlikely that anyone has encoded this type of data, but the possibility is there. Ignoring all that, you should be able to do a lookup from the Unicode table to to, say, Big-5, and back, with one 64K table in each direction, and EUC or otherwise multibyte encode the result before returning it via getdirentries(). This will break under Linux emulation (I believe), since it uses the directory lookup restart code, which will be variant under multibyte translation. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message