Date: Wed, 07 Jul 2004 15:19:51 +0900 From: Alexander Nedotsukov <bland@FreeBSD.org> To: NAKAJI Hiroyuki <nakaji@tutrp.tut.ac.jp> Cc: gnome@FreeBSD.org Subject: Re: converters/libiconv change request for net/samba3 Message-ID: <40EB9607.6020906@FreeBSD.org> In-Reply-To: <87fz84lfaw.fsf@roddy.acest.tutrp.tut.ac.jp> References: <87acyd8zg0.fsf@roddy.acest.tutrp.tut.ac.jp> <40EA57EB.4060607@FreeBSD.org> <871xjp8sim.fsf@roddy.acest.tutrp.tut.ac.jp> <87fz84lfaw.fsf@roddy.acest.tutrp.tut.ac.jp>
next in thread | previous in thread | raw e-mail | index | archive | help
NAKAJI Hiroyuki wrote: >I am very lucky to get some informations from my friends. They say that >libiconv is not complete and it needs refinement. > >1. Miracle Linux, one of the Linux distribution company in Japan which >supports Samba i18n, has a web page about iconv problem. Please check it. > >http://www.miraclelinux.com/english/technet/samba30/iconv_issues.html > >2. Mr. Iijima gave me a sample explanation. > ><cite> >(1)YEN SIGN: When ISO 646 was localized to JIS X0201, JIS committee changed >\x5C from backslash to yen sign. Most Japanese people, however, have used >for a long time yen sign in place of backslash as metacharacter such as >pathname separator on DOS/Windows or on C source code or shellscripts. > >Therefore, Microsoft did a trick. Microsoft mapped JIS X0201's \x5C to >Unicode backslash (U+005C) whereas they left its glyph as yen sign. > >(2)OVERLINE: The same story above applies to \x7E. JIS X0201 now states >that \x7E is overline by default but can be replaced with tilde. > >The whole mapping table is available at: >http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT ></cite> > >Thanks. > > Well. I wasn't too specific last time saying about yen sign and overline symbol, sorry. Take a look at this: http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0201.TXT and this: http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT Then compare to cp932. In short all the above Microsoft tricks are present in last table and GNU libiconv handle them correctly (though it have a problem with another symbols). I consider proposed patch as a hack for another mappings to behave same way. And this doesn't looks good for me. If Microsoft called some hacked Shift_JIS version Shift_JIS it doesn't make it valid for the rest of the world. I'll be happy to commit round trip issue fix to cp932 and add optional eucJP-ms support but leave everyting else as it is now. Btw, are you guys pretty sure you problem comes form libiconv? I have few japanese windows workstations here and if you like can check what's wrong with them. Just give me a simple instructions how to reproduce a problem in this case. Why I asking because I already saw false reports about libiconv problems when people tried to convert windows client encoding to samba's host encoding and this is not always possible. For instance you can not have 1:1 mapping between cp932 and eucJP. All the best, Alexander.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?40EB9607.6020906>