Date: Wed, 1 Dec 2004 14:40:14 +0100 From: Alexander Leidinger <Alexander@Leidinger.net> To: current@freebsd.org Cc: tode@bpanet.de Subject: Bug in our ru_RU.KOI8-R locale (with patch)? Message-ID: <1101908414.41adc9be50c73@netchild.homeip.net>
next in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] Hi, I got a report that our ru_RU.KOI8-R locale seems to be broken. Attached is a test program (test.pl, tested with perl 5.8.2) and some test input (test.txt) which is supposed to show the problem. I can't read any kyrillic language, so I can't really confirm if the attached patch is the right fix. If you run the test program you should see something like this (strange looking text maybe because of the webmailer I use): ---snip--- Match small (RegEx with i flag): 0 Match small (RegEx without i flag): 8 Match for normal (RegEx with i flag): 17 Match for normal (RegEx without i flag): 9 Case - Check for 'яѓјъшэ' lc() => яѓјъшэ uc() => ЯгиЪШЭ lcfirst() => яѓјъшэ ucfirst() => Яѓјъшэ Case - Check for 'Яѓјъшэ' lc() => яѓјъшэ uc() => ЯгиЪШЭ lcfirst() => яѓјъшэ ucfirst() => Яѓјъшэ ---snip--- I'm told the "Case - Check" parts are correct with the patch, but not without it (lc() -> lower case the entire string; uc() -> upper case the entire string; lcfirst() -> lower case the first character; ...). Can someone please confirm this? If this is correct we've solved only a part of the problem. The other part seems to be related to LC_COLLATE. "Match small" with the i flag (case insensitive matching) shouldn't print 0 when "Match normal" with the i flag doesn't print 0. Any ideas how to solve this? If the patch isn't correct we still have a bug somwhere (please CC perl@freebsd.org then). Why isn't perl able to do a case insensitive match in the ru_RU.KOI8-R locale? BTW.: this affects 4.x (problem noticed here), 5.x and -current (I've tested the patch here). Bye, Alexander. -- http://www.Leidinger.net/ Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org/ netchild @ FreeBSD.org : PGP ID = 72077137 [-- Attachment #2 --] #!/usr/bin/env perl use locale; my $file = 'test.txt'; my $pushkin_small = 'пушкин'; my $pushkin_normal = 'Пушкин'; my $data = LoadFile($file); my $count_normal_i = 0; my $count_small_i = 0; my $count_normal = 0; my $count_small = 0; foreach my $line (@{$data}) { $count_normal_i++ if ($line =~ m/$pushkin_normal/isg); $count_small_i++ if ($line =~ m/$pushkin_small/isg); $count_normal++ if ($line =~ m/$pushkin_normal/sg); $count_small++ if ($line =~ m/$pushkin_small/sg); } print "Match small (RegEx with i flag): $count_small_i\n"; print "Match small (RegEx without i flag): $count_small\n"; print "Match for normal (RegEx with i flag): $count_normal_i\n"; print "Match for normal (RegEx without i flag): $count_normal\n\n"; TestCase($pushkin_small); TestCase($pushkin_normal); exit(0); sub TestCase { my $string = shift(@_); print "Case - Check for \'$string\'\n"; print "lc() => ".lc($string)."\n"; print "uc() => ".uc($string)."\n"; print "lcfirst() => ".lcfirst($string)."\n"; print "ucfirst() => ".ucfirst($string)."\n"; print "\n"; return 1; } sub LoadFile { my $file = shift(@_); my @value = (); open(FILE, "<$file"); @value = <FILE>; close(FILE); chomp(@value); return \@value; } [-- Attachment #3 --] пушкин Пушкин Test Test TEST tEST пушкин Пушкин Test Test TEST tEST пушкин пушкин пушкин пушкин Пушкин Пушкин Пушкин Пушкин Пушкин пушкин Пушкин Пушкин пушкин COUNT lower 8 upper 9 [-- Attachment #4 --] --- /usr/src/share/mklocale/ru_RU.KOI8-R.src Fri Nov 30 06:05:53 2001 +++ ru_RU.KOI8-R.src Wed Dec 1 13:38:59 2004 @@ -13,27 +13,27 @@ CONTROL 0x00 - 0x1f 0x7f DIGIT '0' - '9' GRAPH 0x21 - 0x7e 0x80 - 0x99 0x9b - 0xff -LOWER 'a' - 'z' 0xa3 0xc0 - 0xdf +LOWER 'a' - 'z' 0xb3 0xe0 - 0xff PUNCT 0x21 - 0x2f 0x3a - 0x40 0x5b - 0x60 0x7b - 0x7e SPACE 0x09 - 0x0d 0x20 0x9a -UPPER 'A' - 'Z' 0xb3 0xe0 - 0xff +UPPER 'A' - 'Z' 0xa3 0xc0 - 0xdf XDIGIT '0' - '9' 'a' - 'f' 'A' - 'F' BLANK ' ' '\t' 0x9a PRINT 0x20 - 0x7e 0x80 - 0xff MAPLOWER <'A' - 'Z' : 'a'> MAPLOWER <'a' - 'z' : 'a'> -MAPLOWER <0xb3 0xa3> -MAPLOWER <0xa3 0xa3> -MAPLOWER <0xe0 - 0xff : 0xc0> -MAPLOWER <0xc0 - 0xdf : 0xc0> +MAPLOWER <0xb3 0xb3> +MAPLOWER <0xa3 0xb3> +MAPLOWER <0xe0 - 0xff : 0xe0> +MAPLOWER <0xc0 - 0xdf : 0xe0> MAPUPPER <'A' - 'Z' : 'A'> MAPUPPER <'a' - 'z' : 'A'> -MAPUPPER <0xb3 0xb3> -MAPUPPER <0xa3 0xb3> -MAPUPPER <0xe0 - 0xff : 0xe0> -MAPUPPER <0xc0 - 0xdf : 0xe0> +MAPUPPER <0xb3 0xa3> +MAPUPPER <0xa3 0xa3> +MAPUPPER <0xe0 - 0xff : 0xc0> +MAPUPPER <0xc0 - 0xdf : 0xc0> TODIGIT <'0' - '9' : 0> TODIGIT <'A' - 'F' : 10>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1101908414.41adc9be50c73>
