From owner-freebsd-hackers@FreeBSD.ORG Sun Aug 13 12:27:14 2006 Return-Path: X-Original-To: freebsd-hackers@FreeBSD.ORG Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D092716A4DF; Sun, 13 Aug 2006 12:27:14 +0000 (UTC) (envelope-from imura@FreeBSD.ORG) Received: from userg502.nifty.com (userg502.nifty.com [202.248.238.82]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4BC5943D53; Sun, 13 Aug 2006 12:27:14 +0000 (GMT) (envelope-from imura@FreeBSD.ORG) Received: from [192.168.11.3] (nttkyo300142.tkyo.nt.ftth.ppp.infoweb.ne.jp [58.0.118.142])by userg502.nifty.com with ESMTP id k7DCQmLt007016; Sun, 13 Aug 2006 21:26:49 +0900 Message-ID: <44DF1A88.9070504@FreeBSD.ORG> Date: Sun, 13 Aug 2006 21:26:48 +0900 From: "R. Imura" User-Agent: Thunderbird 1.5 (X11/20060112) MIME-Version: 1.0 To: Intron References: <20060812235423.af71b566.ota@j.email.ne.jp> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Yoshihiro Ota , freebsd-hackers@FreeBSD.ORG, imura@FreeBSD.ORG Subject: Re: UTF-8 <-> UTF-16BE Converter in Kernel Needs Test X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Aug 2006 12:27:15 -0000 Hi, Intron, - iconv(9), aka kiconv, is not implementation of POSIX iconv(3). - UDF is another kiconv user. - kiconv is not a present for Microsoft. - UCS-2 is not enough for explaining full GB18030. I'd like to know how Microsoft controls GB18030. > 1. I don't know why the author takes the concept of Microsoft's 16-bit > wchar_t as UTF-16BE (the macro ENCODING_UNICODE in /sys/sys/iconv.h). You can see why it's UTF-16BE via cvs logs. - R. Imura Intron wrote: > Yoshihiro Ota wrote: > >> You may try these patches, first. >> http://people.freebsd.org/~imura/kiconv/ >> >> It sounds like these patches implement better supports. >> >> Hiro >> >> On Sun, 13 Aug 2006 01:28:17 +0800 >> "Intron" wrote: >> >>> I'm sorry that I send my experimental patch set here to call for test. >>> But if I send it to freebsd-i18n@, I wonder no one will respond to me. >>> >>> Download: http://ftp.intron.ac/tmp/kiconv_utf8_20060813.tar.bz2 >>> >>> My patch set implements a UTF-8 <-> UTF-16BE converter for iconv in >>> kernel. It doesn't need kiconv(3) to send unnecessary UTF-8 <-> UTF-16BE >>> conversion tables to kernel. And it doesn't require the help of GNU >>> libiconv, which kiconv(3) depends on. >>> >>> With my patch set, if you mount FAT/NTFS/ISO9660 file system, less >>> resource will be occupied than before: >>> >>> mount_msdosfs -L ll_NN.UTF-8 /dev/md0s1 /mnt >>> >>> See my "readme.txt" for installation guide. >>> >>> ************ ATTENTION !!! ************ >>> >>> 1. Do NOT test my patch set upon your CRITICAL FAT/NTFS partition !!! >>> >>> 2. Limited by BUGGY FreeBSD modules msdosfs/ntfs/cd9660, whether you >>> use my patch set or not, only 1/2-byte UTF-8 character (up to 0x7ff) >>> is supported, which means only a few languages are supported. >>> >>> I will try to patch those modules to support all languages (up to >>> 6-byte UTF-8 character) included in current Unicode step by step. >>> >>> ------------------------------------------------------------------------ >>> From Beijing, China >>> >>> _______________________________________________ >>> freebsd-hackers@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > > I have looked in his patch set. Some essential problems: > > 1. I don't know why the author takes the concept of Microsoft's 16-bit > wchar_t as UTF-16BE (the macro ENCODING_UNICODE in /sys/sys/iconv.h). > 16-bit wchar_t is only enough for UCS-2 BE/LE (Unicode BMP) while > real UTF-16 includes 4-byte formation. > > 2. Actually, kernel iconv is prepared only for Microsoft (FAT32, NTFS, > Joliet extension to ISO 9660, SambaFS) so far. It should be a minimum > function set just fit for Microsoft. Above all, it is not a complete > implementation of UNIX98 iconv and should be as simple as possible. > > 3. In fact, UNIX98 iconv(3) handles any character set as char array. > The usage of wchar_t is not of a good style in modules msdosfs/ > cd9660/ntfs. String function such as memcpy() should be used instead. > If 5/6-byte UTF-8 sequence (Annex D of ISO/IEC 10646-1:2000) or other > special encoding is allowed, handling by char array will be still > robust. > > ------------------------------------------------------------------------ > From Beijing, China