From owner-freebsd-current@FreeBSD.ORG Wed Dec 1 14:13:49 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BF6CE16A4CE for ; Wed, 1 Dec 2004 14:13:49 +0000 (GMT) Received: from mx1.mail.ru (mx1.mail.ru [194.67.23.121]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7013243D4C for ; Wed, 1 Dec 2004 14:13:49 +0000 (GMT) (envelope-from DAntrushin@mail.ru) Received: from [81.3.158.67] (port=58763 helo=[129.159.124.237]) by mx1.mail.ru with esmtp id 1CZVEp-000OXj-00; Wed, 01 Dec 2004 17:13:47 +0300 Message-ID: <41ADD1DA.9000509@mail.ru> Date: Wed, 01 Dec 2004 17:14:50 +0300 From: Denis Antrushin User-Agent: Mozilla Thunderbird 0.9 (X11/20041119) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Alexander Leidinger References: <1101908414.41adc9be50c73@netchild.homeip.net> In-Reply-To: <1101908414.41adc9be50c73@netchild.homeip.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam: Not detected cc: current@freebsd.org cc: tode@bpanet.de Subject: Re: Bug in our ru_RU.KOI8-R locale (with patch)? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Dec 2004 14:13:49 -0000 Alexander Leidinger wrote: > Hi, > > I got a report that our ru_RU.KOI8-R locale seems to be broken. Attached > is a test program (test.pl, tested with perl 5.8.2) and some test input > (test.txt) which is supposed to show the problem. I can't read any > cyrillic language, so I can't really confirm if the attached patch is the > right fix. First of all, test.txt is in CP1251 encoding, not KOI8-R ;-) Second, patch is plain wrong -- it replaces KOI8-R character codes with CP1251 ones. > If you run the test program you should see something like this (strange > looking text maybe because of the webmailer I use): > ---snip--- > Match small (RegEx with i flag): 0 > Match small (RegEx without i flag): 8 > Match for normal (RegEx with i flag): 17 > Match for normal (RegEx without i flag): 9 > > Case - Check for 'яѓјъшэ' > lc() => яѓјъшэ > uc() => ЯгиЪШЭ > lcfirst() => яѓјъшэ > ucfirst() => Яѓјъшэ > > Case - Check for 'Яѓјъшэ' > lc() => яѓјъшэ > uc() => ЯгиЪШЭ > lcfirst() => яѓјъшэ > ucfirst() => Яѓјъшэ > ---snip--- > > I'm told the "Case - Check" parts are correct with the patch, but not > without it (lc() -> lower case the entire string; uc() -> upper case the > entire string; lcfirst() -> lower case the first character; ...). Can > someone please confirm this? This is what test gives me (transliterated to ascii): Case - Check for 'pushkin' lc() => pushkin uc() => PUSHKIN lcfirst() => pushkin ucfirst() => Pushkin Case - Check for 'Pushkin' lc() => pushkin uc() => PUSHKIN lcfirst() => pushkin ucfirst() => Pushkin It seems correct for me... > If this is correct we've solved only a part of the problem. The other > part seems to be related to LC_COLLATE. "Match small" with the i flag > (case insensitive matching) shouldn't print 0 when "Match normal" with > the i flag doesn't print 0. Any ideas how to solve this? > > If the patch isn't correct we still have a bug somwhere (please CC > perl@freebsd.org then). Why isn't perl able to do a case insensitive > match in the ru_RU.KOI8-R locale? > > BTW.: this affects 4.x (problem noticed here), 5.x and -current (I've > tested the patch here). > > Bye, > Alexander. >