Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Nov 2004 17:42:07 +0100
From:      "Alexander@Leidinger.net" <netchild@FreeBSD.org>
To:        ache@freebsd.org, perl@freebsd.org
Cc:        tode@bpanet.de
Subject:   Strange behavior of LANG=ru_RU.KOI8-R on 4.x
Message-ID:  <1101228127.41a3685fa3921@netchild.homeip.net>

next in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
Hi,

I got a report of strange behavior if someone uses ru_RU.KOI8-R with perl
5.8.5 on FreeBSD 4.7. I don't have acess to a 4.7 system, but I can
reproduce it on my -current system.

Safe the attachments into a directory and run (assuming 5.3 or -current)
 LANG=C perl test.pl
 LANG=ru_RU.KOI8-R perl testl.pl
 LANG=ru_RU.UTF-8 perl test.pl

I did this and I noticed that with LANG=C there's no change (e.g. first
letter is always like in the "Check" line, even if it should have
changed to lower or upper case). I expected this since the C locale
can't kow about russian letters. The number of matches is expected too.

With ru_RU.KOI8-R it looks like the meaning of lower and upper case is
reversed. If I use ru_RU.UTF-8, the output looks right, but the number
of matches still doesn't show a sane output (the number of case insensitive
matches for small and normal isn't the same).

Background: the search function of a large perl application (Interchange)
fails to do case insensitive searches in the above mentioned locale.

Any ideas what's happening here and how to fix it?

Bye,
Alexander.

-- 
http://www.Leidinger.net/     Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org/        netchild @ FreeBSD.org  : PGP ID = 72077137
Endless Loop: n., see Loop, Endless.
Loop, Endless: n., see Endless Loop.
		-- Random Shack Data Processing Dictionary

[-- Attachment #2 --]
#!/usr/bin/env perl

use locale;

my $file		= 'test.txt';
my $pushkin_small	= 'пушкин';
my $pushkin_normal	= 'Пушкин';

my $data		= LoadFile($file);

my $count_normal_i	= 0;
my $count_small_i	= 0;
my $count_normal      = 0;
my $count_small       = 0;

foreach my $line (@{$data}) {
	$count_normal_i++ if ($line =~ m/$pushkin_normal/isg);
	$count_small_i++ if ($line =~ m/$pushkin_small/isg);
	$count_normal++ if ($line =~ m/$pushkin_normal/sg);
        $count_small++ if ($line =~ m/$pushkin_small/sg);
}

print "Match small (RegEx with i flag): $count_small_i\n";
print "Match small (RegEx without i flag): $count_small\n";

print "Match for normal (RegEx with i flag): $count_normal_i\n";
print "Match for normal (RegEx without i flag): $count_normal\n\n";
TestCase($pushkin_small);
TestCase($pushkin_normal);

exit(0);


sub TestCase {
	my $string	= shift(@_);
	print "Case - Check for \'$string\'\n";
	print "lc() => ".lc($string)."\n";
	print "uc() => ".uc($string)."\n";
	print "lcfirst() => ".lcfirst($string)."\n";
	print "ucfirst() => ".ucfirst($string)."\n";
	
	print "\n";

	return 1;
}


sub LoadFile {
	my $file	= shift(@_);
	my @value	= ();
	open(FILE, "<$file");
	@value		= <FILE>;
	close(FILE);
	chomp(@value);
	return \@value;
}


[-- Attachment #3 --]
пушкин
Пушкин
Test
Test
TEST
tEST
пушкин
Пушкин
Test
Test
TEST
tEST
пушкин
пушкин
пушкин
пушкин
Пушкин
Пушкин
Пушкин
Пушкин
Пушкин
пушкин
Пушкин
Пушкин
пушкин

COUNT lower 8 upper 9


Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1101228127.41a3685fa3921>