From owner-freebsd-hackers@FreeBSD.ORG  Sun Aug 13 09:53:34 2006
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
X-Original-To: freebsd-hackers@freebsd.org
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C52EE16A4DF;
	Sun, 13 Aug 2006 09:53:34 +0000 (UTC) (envelope-from admin@intron.ac)
Received: from intron.ac (unknown [210.51.165.237])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2277743D58;
	Sun, 13 Aug 2006 09:53:29 +0000 (GMT) (envelope-from admin@intron.ac)
Received: from localhost (localhost [127.0.0.1]) (uid 1003)
	by intron.ac with local; Sun, 13 Aug 2006 17:53:27 +0800
	id 00102DF1.44DEF697.00014988
References: <courier.44DE0FB1.0001160E@intron.ac>
	<20060812235423.af71b566.ota@j.email.ne.jp>
In-Reply-To: <20060812235423.af71b566.ota@j.email.ne.jp>
From: "Intron" <mag@intron.ac>
To: Yoshihiro Ota <ota@j.email.ne.jp>
Date: Sun, 13 Aug 2006 17:53:27 +0800
Mime-Version: 1.0
Content-Type: text/plain; charset="gb2312"; format=flowed
Content-Transfer-Encoding: 7bit
Message-ID: <courier.44DEF697.00014988@intron.ac>
Cc: freebsd-hackers@freebsd.org, imura@FreeBSD.org
Subject: Re: UTF-8 <-> UTF-16BE Converter in Kernel Needs Test
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 13 Aug 2006 09:53:35 -0000

Yoshihiro Ota wrote:

> You may try these patches, first.
> http://people.freebsd.org/~imura/kiconv/
> 
> It sounds like these patches implement better supports.
> 
> Hiro
> 
> On Sun, 13 Aug 2006 01:28:17 +0800
> "Intron" <mag@intron.ac> wrote:
> 
>> I'm sorry that I send my experimental patch set here to call for test.
>> But if I send it to freebsd-i18n@, I wonder no one will respond to me.
>> 
>> Download: http://ftp.intron.ac/tmp/kiconv_utf8_20060813.tar.bz2
>> 
>> My patch set implements a UTF-8 <-> UTF-16BE converter for iconv in
>> kernel. It doesn't need kiconv(3) to send unnecessary UTF-8 <-> UTF-16BE
>> conversion tables to kernel. And it doesn't require the help of GNU
>> libiconv, which kiconv(3) depends on.
>> 
>> With my patch set, if you mount FAT/NTFS/ISO9660 file system, less
>> resource will be occupied than before:
>> 
>> mount_msdosfs -L ll_NN.UTF-8 /dev/md0s1 /mnt
>> 
>> See my "readme.txt" for installation guide.
>> 
>>                 ************  ATTENTION !!!  ************
>> 
>> 1. Do NOT test my patch set upon your CRITICAL FAT/NTFS partition !!!
>> 
>> 2. Limited by BUGGY FreeBSD modules msdosfs/ntfs/cd9660, whether you
>>     use my patch set or not, only 1/2-byte UTF-8 character (up to 0x7ff)
>>     is supported, which means only a few languages are supported.
>> 
>>     I will try to patch those modules to support all languages (up to
>>     6-byte UTF-8 character) included in current Unicode step by step.
>> 
>> ------------------------------------------------------------------------
>>                                                  From Beijing, China
>> 
>> _______________________________________________
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

I have looked in his patch set. Some essential problems:

1. I don't know why the author takes the concept of Microsoft's 16-bit
    wchar_t as UTF-16BE (the macro ENCODING_UNICODE in /sys/sys/iconv.h).
    16-bit wchar_t is only enough for UCS-2 BE/LE (Unicode BMP) while
    real UTF-16 includes 4-byte formation.

2. Actually, kernel iconv is prepared only for Microsoft (FAT32, NTFS,
    Joliet extension to ISO 9660, SambaFS) so far. It should be a minimum
    function set just fit for Microsoft. Above all, it is not a complete
    implementation of UNIX98 iconv and should be as simple as possible.

3. In fact, UNIX98 iconv(3) handles any character set as char array.
    The usage of wchar_t is not of a good style in modules msdosfs/
    cd9660/ntfs. String function such as memcpy() should be used instead.
    If 5/6-byte UTF-8 sequence (Annex D of ISO/IEC 10646-1:2000) or other
    special encoding is allowed, handling by char array will be still
    robust.

------------------------------------------------------------------------
                                                From Beijing, China