From owner-freebsd-current Wed Dec 24 08:54:41 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id IAA17876 for current-outgoing; Wed, 24 Dec 1997 08:54:41 -0800 (PST) (envelope-from owner-freebsd-current) Received: from chiark.greenend.org.uk (chiark.greenend.org.uk [195.224.76.132]) by hub.freebsd.org (8.8.7/8.8.7) with SMTP id IAA17864; Wed, 24 Dec 1997 08:54:23 -0800 (PST) (envelope-from johns@chiark.greenend.org.uk) Received: from localhost by chiark.greenend.org.uk with smtp id m0xku46-0004p8C (Debian /\oo/\ Smail3.1.29.1 #29.37); Wed, 24 Dec 97 16:53 GMT Date: Wed, 24 Dec 1997 16:53:50 +0000 (GMT) From: John Sullivan Reply-To: John Sullivan To: Stefan Esser cc: freebsd-current@freebsd.org, The Hermit Hacker Subject: Re: Wine Emulator Patch... In-Reply-To: <19971224110019.23782@mi.uni-koeln.de> Message-ID: Organization: Huh? MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Wed, 24 Dec 1997, Stefan Esser wrote: >On 1997-12-23 17:53 -0500, The Hermit Hacker wrote: >> around this deficiency (as did I when I built it onto my computer), but I >> got into a discussion with the developers about this in the newsgroup, and >> have been informed that this is, in fact, wrong :( >Well, and you trust that information ? :) Oh I see - my opinion can't possibly be right because I expressed it on USENET. ;) >If you are working in an 8bit locale, then there >is no problem. > >If somebody is working in a non-8bit locale, and >there is a problem, please let me know! Well the problem as I see it is that the functions in question were being used to support Windows' Unicode conversion/manipulation functions (a 16-bit encoding of a 16-bit character set). Unless you like lots of 1-character truncated strings, there most definitely *is* a problem with using standard MBCS routines to implement these. Even without the character set difference (locale defined vs. UCS2), a wide character string is *not* the same as an MBCS encoded string, ever, under any (8 or >8 bit) locale. A single wide character may or may not be the same as some ordering of the component MBCS bytes joined together, but who knows? >> My argument was weak to start off with, in that I didnt' believe >> that anything other then Linux had this, and that putting wctype.h as part >> of the distribution made it more Linux-only...except that other OSs >> (Solaris, AIX, etc) do have a wctype.h file, so why are we missing it? For the record, this is a comment from the Linux/GNU libc header: /* * ISO/IEC 9899:1990/Amendment 1:1995 7.15: * Wide-character classification and mapping utilities */ >Perhaps, because some of the FreeBSD developers >already spent a lot of time with support for >wide character locales ? ;-) Eh? I'm not sure I understand what you're implying here. >> > If you go into misc/lstr.c with vi, do a search/replace of: >> > >> > tow -> to >> > isw -> is >> Well, it probably will compile, but the two sets of functions are >> *supposed* to be different. >Supposed ??? >Don't think so. Why can't the simple is__() / to__() >function get it right ? Because MBCS functions assume an array of bytes, where a single character is represented by a possibly variable, >=1 number of bytess. Wide character functions assume an array of fixed size elements each representing a single character. The two sets are rarely interchangable. It turns out now that I don't believe the tow__() version are entirely suitable either, but with a 16 bit wctype_t they will do the right thing more often than the MBCS versions. >Sure. And just check out what FreeBSD has in >/usr/include/ctype.h (simplified, I'm using >tolower here, others are similar) : >There exist two versions of that function, >one (compiled without XPG4) in libc, the >other (compiled with XPG4) in libxpg4. We >only need to add -lxpg4 to the linker >command line, and the full multi-byte range >should be supported. I'll try this on my >system, later today, and will then commit >the patch to the Wine port. Hmm. How do you get the value to pass to tolower()? I guess if you can assume SBCS, tolower(*str) will work. If you're working in an MBCS locale you don't know how many bytes to extract from the string, so the only sensible thing to do is tolower(mbtowc()), but that's wrong - given that you know the locale is MBCS you can't necessarily assume tolower will do the right thing - you need towlower. Passing in a UCS2 character will appear to work a lot of the time. If your locale is latin-1, you'll get most of the 8-bit characters right. Under any other locale you'll at least get the 7-bit characters right. There are an awful lot of characters in various locales, though, that have >8-bit encodings in Unicode that you'll get wrong, and not even all of latin-1's code points are the same as UCS2. Do you work under a latin-1 locale by any chance? (I usually do btw.) >Well, it's there in ctype.h ... >I don't see, why another header is required. Even if >they stick with the towlower() call (most probably >because tolower() can't deal with wide characters), >those definitions could have gone into ctype.h ... Oh, no real reason as far as I can tell. But then, no system ever needs more than 1 header file. Stick it *all* in stdio.h! The standard probably says these functions can be found in wctype.h, so that's really where they should go. >FreeBSD already contains a rather complete set of >string functions on wide and multi-byte characters. >See "man 3 multibyte" or "man 3 mbrune" for more >information. I don't dispute this. You can put mappings from tow__() to the BSD equivalents into wctype.h and all should work fine, yesno? >I guess we should get some of the Asian developers >to test Wine with Chinese/Japanese/Korean versions >of Windows ... Good idea. When it comes down to it, there's not much point in arguing implementations under locales (such as latin-1) which don't really present any challenge to the conversion routines. It's important to get right though, because the infrequently seen locales exist. John -- i built it up now i take it apart climbed up real high now fall down real far no need for me to stay the last thing left i just threw it away i put my faith in god and my trust in you now there's nothing more fucked up i could do

:-p