Date: Sun, 13 Aug 2006 17:53:27 +0800 From: "Intron" <mag@intron.ac> To: Yoshihiro Ota <ota@j.email.ne.jp> Cc: freebsd-hackers@freebsd.org, imura@FreeBSD.org Subject: Re: UTF-8 <-> UTF-16BE Converter in Kernel Needs Test Message-ID: <courier.44DEF697.00014988@intron.ac> In-Reply-To: <20060812235423.af71b566.ota@j.email.ne.jp> References: <courier.44DE0FB1.0001160E@intron.ac> <20060812235423.af71b566.ota@j.email.ne.jp>
next in thread | previous in thread | raw e-mail | index | archive | help
Yoshihiro Ota wrote: > You may try these patches, first. > http://people.freebsd.org/~imura/kiconv/ > > It sounds like these patches implement better supports. > > Hiro > > On Sun, 13 Aug 2006 01:28:17 +0800 > "Intron" <mag@intron.ac> wrote: > >> I'm sorry that I send my experimental patch set here to call for test. >> But if I send it to freebsd-i18n@, I wonder no one will respond to me. >> >> Download: http://ftp.intron.ac/tmp/kiconv_utf8_20060813.tar.bz2 >> >> My patch set implements a UTF-8 <-> UTF-16BE converter for iconv in >> kernel. It doesn't need kiconv(3) to send unnecessary UTF-8 <-> UTF-16BE >> conversion tables to kernel. And it doesn't require the help of GNU >> libiconv, which kiconv(3) depends on. >> >> With my patch set, if you mount FAT/NTFS/ISO9660 file system, less >> resource will be occupied than before: >> >> mount_msdosfs -L ll_NN.UTF-8 /dev/md0s1 /mnt >> >> See my "readme.txt" for installation guide. >> >> ************ ATTENTION !!! ************ >> >> 1. Do NOT test my patch set upon your CRITICAL FAT/NTFS partition !!! >> >> 2. Limited by BUGGY FreeBSD modules msdosfs/ntfs/cd9660, whether you >> use my patch set or not, only 1/2-byte UTF-8 character (up to 0x7ff) >> is supported, which means only a few languages are supported. >> >> I will try to patch those modules to support all languages (up to >> 6-byte UTF-8 character) included in current Unicode step by step. >> >> ------------------------------------------------------------------------ >> From Beijing, China >> >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" I have looked in his patch set. Some essential problems: 1. I don't know why the author takes the concept of Microsoft's 16-bit wchar_t as UTF-16BE (the macro ENCODING_UNICODE in /sys/sys/iconv.h). 16-bit wchar_t is only enough for UCS-2 BE/LE (Unicode BMP) while real UTF-16 includes 4-byte formation. 2. Actually, kernel iconv is prepared only for Microsoft (FAT32, NTFS, Joliet extension to ISO 9660, SambaFS) so far. It should be a minimum function set just fit for Microsoft. Above all, it is not a complete implementation of UNIX98 iconv and should be as simple as possible. 3. In fact, UNIX98 iconv(3) handles any character set as char array. The usage of wchar_t is not of a good style in modules msdosfs/ cd9660/ntfs. String function such as memcpy() should be used instead. If 5/6-byte UTF-8 sequence (Annex D of ISO/IEC 10646-1:2000) or other special encoding is allowed, handling by char array will be still robust. ------------------------------------------------------------------------ From Beijing, China
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?courier.44DEF697.00014988>