Date: Tue, 23 Nov 2004 17:42:07 +0100 From: "Alexander@Leidinger.net" <netchild@FreeBSD.org> To: ache@freebsd.org, perl@freebsd.org Cc: tode@bpanet.de Subject: Strange behavior of LANG=ru_RU.KOI8-R on 4.x Message-ID: <1101228127.41a3685fa3921@netchild.homeip.net>
next in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] Hi, I got a report of strange behavior if someone uses ru_RU.KOI8-R with perl 5.8.5 on FreeBSD 4.7. I don't have acess to a 4.7 system, but I can reproduce it on my -current system. Safe the attachments into a directory and run (assuming 5.3 or -current) LANG=C perl test.pl LANG=ru_RU.KOI8-R perl testl.pl LANG=ru_RU.UTF-8 perl test.pl I did this and I noticed that with LANG=C there's no change (e.g. first letter is always like in the "Check" line, even if it should have changed to lower or upper case). I expected this since the C locale can't kow about russian letters. The number of matches is expected too. With ru_RU.KOI8-R it looks like the meaning of lower and upper case is reversed. If I use ru_RU.UTF-8, the output looks right, but the number of matches still doesn't show a sane output (the number of case insensitive matches for small and normal isn't the same). Background: the search function of a large perl application (Interchange) fails to do case insensitive searches in the above mentioned locale. Any ideas what's happening here and how to fix it? Bye, Alexander. -- http://www.Leidinger.net/ Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org/ netchild @ FreeBSD.org : PGP ID = 72077137 Endless Loop: n., see Loop, Endless. Loop, Endless: n., see Endless Loop. -- Random Shack Data Processing Dictionary [-- Attachment #2 --] #!/usr/bin/env perl use locale; my $file = 'test.txt'; my $pushkin_small = 'пушкин'; my $pushkin_normal = 'Пушкин'; my $data = LoadFile($file); my $count_normal_i = 0; my $count_small_i = 0; my $count_normal = 0; my $count_small = 0; foreach my $line (@{$data}) { $count_normal_i++ if ($line =~ m/$pushkin_normal/isg); $count_small_i++ if ($line =~ m/$pushkin_small/isg); $count_normal++ if ($line =~ m/$pushkin_normal/sg); $count_small++ if ($line =~ m/$pushkin_small/sg); } print "Match small (RegEx with i flag): $count_small_i\n"; print "Match small (RegEx without i flag): $count_small\n"; print "Match for normal (RegEx with i flag): $count_normal_i\n"; print "Match for normal (RegEx without i flag): $count_normal\n\n"; TestCase($pushkin_small); TestCase($pushkin_normal); exit(0); sub TestCase { my $string = shift(@_); print "Case - Check for \'$string\'\n"; print "lc() => ".lc($string)."\n"; print "uc() => ".uc($string)."\n"; print "lcfirst() => ".lcfirst($string)."\n"; print "ucfirst() => ".ucfirst($string)."\n"; print "\n"; return 1; } sub LoadFile { my $file = shift(@_); my @value = (); open(FILE, "<$file"); @value = <FILE>; close(FILE); chomp(@value); return \@value; } [-- Attachment #3 --] пушкин Пушкин Test Test TEST tEST пушкин Пушкин Test Test TEST tEST пушкин пушкин пушкин пушкин Пушкин Пушкин Пушкин Пушкин Пушкин пушкин Пушкин Пушкин пушкин COUNT lower 8 upper 9
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1101228127.41a3685fa3921>
