From owner-freebsd-current@FreeBSD.ORG  Wed Dec  1 14:13:49 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id BF6CE16A4CE
	for <current@freebsd.org>; Wed,  1 Dec 2004 14:13:49 +0000 (GMT)
Received: from mx1.mail.ru (mx1.mail.ru [194.67.23.121])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 7013243D4C
	for <current@freebsd.org>; Wed,  1 Dec 2004 14:13:49 +0000 (GMT)
	(envelope-from DAntrushin@mail.ru)
Received: from [81.3.158.67] (port=58763 helo=[129.159.124.237])
	by mx1.mail.ru with esmtp 
	id 1CZVEp-000OXj-00; Wed, 01 Dec 2004 17:13:47 +0300
Message-ID: <41ADD1DA.9000509@mail.ru>
Date: Wed, 01 Dec 2004 17:14:50 +0300
From: Denis Antrushin <DAntrushin@mail.ru>
User-Agent: Mozilla Thunderbird 0.9 (X11/20041119)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Alexander Leidinger <Alexander@Leidinger.net>
References: <1101908414.41adc9be50c73@netchild.homeip.net>
In-Reply-To: <1101908414.41adc9be50c73@netchild.homeip.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam: Not detected
cc: current@freebsd.org
cc: tode@bpanet.de
Subject: Re: Bug in our ru_RU.KOI8-R locale (with patch)?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Dec 2004 14:13:49 -0000

Alexander Leidinger wrote:
> Hi,
> 
> I got a report that our ru_RU.KOI8-R locale seems to be broken. Attached
> is a test program (test.pl, tested with perl 5.8.2) and some test input
> (test.txt) which is supposed to show the problem. I can't read any
> cyrillic language, so I can't really confirm if the attached patch is the
> right fix.
First of all, test.txt is in CP1251 encoding, not KOI8-R ;-)
Second, patch is plain wrong -- it replaces KOI8-R character codes with
CP1251 ones.

> If you run the test program you should see something like this (strange
> looking text maybe because of the webmailer I use):
> ---snip---
> Match small (RegEx with i flag): 0
> Match small (RegEx without i flag): 8
> Match for normal (RegEx with i flag): 17
> Match for normal (RegEx without i flag): 9
> 
> Case - Check for '&#1103;&#1107;&#1112;&#1098;&#1096;&#1101;'
> lc() => &#1103;&#1107;&#1112;&#1098;&#1096;&#1101;
> uc() => &#1071;&#1075;&#1080;&#1066;&#1064;&#1069;
> lcfirst() => &#1103;&#1107;&#1112;&#1098;&#1096;&#1101;
> ucfirst() => &#1071;&#1107;&#1112;&#1098;&#1096;&#1101;
> 
> Case - Check for '&#1071;&#1107;&#1112;&#1098;&#1096;&#1101;'
> lc() => &#1103;&#1107;&#1112;&#1098;&#1096;&#1101;
> uc() => &#1071;&#1075;&#1080;&#1066;&#1064;&#1069;
> lcfirst() => &#1103;&#1107;&#1112;&#1098;&#1096;&#1101;
> ucfirst() => &#1071;&#1107;&#1112;&#1098;&#1096;&#1101;
> ---snip---
> 
> I'm told the "Case - Check" parts are correct with the patch, but not
> without it (lc() -> lower case the entire string; uc() -> upper case the
> entire string; lcfirst() -> lower case the first character; ...). Can
> someone please confirm this?
This is what test gives me (transliterated to ascii):

Case - Check for 'pushkin'
lc() => pushkin
uc() => PUSHKIN
lcfirst() => pushkin
ucfirst() => Pushkin

Case - Check for 'Pushkin'
lc() => pushkin
uc() => PUSHKIN
lcfirst() => pushkin
ucfirst() => Pushkin

It seems correct for me...


> If this is correct we've solved only a part of the problem. The other
> part seems to be related to LC_COLLATE. "Match small" with the i flag
> (case insensitive matching) shouldn't print 0 when "Match normal" with
> the i flag doesn't print 0. Any ideas how to solve this?
> 
> If the patch isn't correct we still have a bug somwhere (please CC
> perl@freebsd.org then). Why isn't perl able to do a case insensitive
> match in the ru_RU.KOI8-R locale?
> 
> BTW.: this affects 4.x (problem noticed here), 5.x and -current (I've
> tested the patch here).
> 
> Bye,
> Alexander.
>