Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Oct 1999 19:56:51 +0400
From:      Grigoriy Strokin <grg@philol.msu.ru>
To:        freebsd-hackers@freebsd.org
Cc:        ache@freebsd.org, freebsd-bugs@freebsd.org
Subject:   comm doesn't obey current locale collation
Message-ID:  <19991021195649.A36122@isabase.philol.msu.ru>

next in thread | raw e-mail | index | archive | help
Hello,

6 months ago I have sent a 'send-pr' about /usr/bin/comm (Problem Report
bin/11221).

Still there are no follow-ups, no has been this report assigned to any
responsible person. 

What might this mean?



---------------- forward ---------------

Problem Report bin/11221

comm doesn't obey current locale collation

Confidential
    no 
Severity
    serious 
Priority
    medium 
Responsible
    freebsd-bugs@freebsd.org 
State
    open 
Class
    sw-bug 
Submitter-Id
    current-users 
Arrival-Date
    Mon Apr 19 10:40:03 PDT 1999 
Last-Modified
    never 
Originator
    Grigoriy Strokin grg@philol.msu.ru 
Release
    FreeBSD 3.1-STABLE i386 
Organization

    Moscow University

Environment

    $LANG set to ru_RU.KOI8-R


Description

    Comm produces wrong results when processing 8-bit text files
    sorted with /usr/bin/sort according to current locale (ru_RU.KOI8-R)


How-To-Repeat

    Unpack the following shar-archive and call
      LANG=ru_RU.KOI8-R comm jaa.srt jaa2.srt
    Several identical characters will appear
    in both first and second column,
    whereas this must not occur with
    these files that were produced
    as output of 
       LANG=ru_RU.KOI8-R sort

---------------------CUT------------------------------------------

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#       jaa.srt
#       jaa2.srt
#
echo x - jaa.srt
sed 's/^X//' >jaa.srt << 'END-of-jaa.srt'
Xô
Xõ
Xæ
Xö
Xé
Xç
Xà
Xù
Xü
Xñ
Xý
Xû
Xø
Xá
Xó
END-of-jaa.srt
echo x - jaa2.srt
sed 's/^X//' >jaa2.srt << 'END-of-jaa2.srt'
Xô
Xõ
Xæ
Xè
Xö
Xé
Xç
Xà
Xù
Xü
Xý
Xø
Xá
Xó
END-of-jaa2.srt
exit



Fix

    Apply the patch:


--- comm.c.orig Mon Apr 19 16:57:56 1999
+++ comm.c      Mon Apr 19 19:45:49 1999
@@ -55,9 +55,29 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
+#include <locale.h>
+#include <ctype.h>
 
 #define        MAXLINELEN      (LINE_MAX + 1)
 
+/* The standard library strcoll, an analog of strcmp that takes into account
+ * the current locale, but strcasecmp does not have such an analog.
+ * So let's define a replacement, locale_dependent_strcasecmp 
+ * */
+
+int locale_dependent_strcasecmp(const char *s1, const char *s2)
+{
+  char a1[MAXLINELEN], a2[MAXLINELEN];
+  char *c;
+  for (c = a1; *s1; c++, s1++)
+    *c = toupper((unsigned char)(*s1));
+  *c = 0;
+  for (c = a2; *s2; c++, s2++)
+    *c = toupper((unsigned char)(*s2));
+  *c = 0;
+  return strcoll(a1, a2);
+}
+
 char *tabs[] = { "", "\t", "\t\t" };
 
 FILE   *file __P((char *));
@@ -74,7 +94,7 @@
        FILE *fp1, *fp2;
        char *col1, *col2, *col3;
        char **p, line1[MAXLINELEN], line2[MAXLINELEN];
-
+  setlocale(LC_ALL, "");
        flag1 = flag2 = flag3 = 1;
        iflag = 0;
 
@@ -139,9 +159,9 @@
 
                /* lines are the same */
                if(iflag)
-                       comp = strcasecmp(line1, line2);
+                       comp = locale_dependent_strcasecmp(line1, line2);
                else
-                       comp = strcmp(line1, line2);
+                       comp = strcoll(line1, line2);
 
                if (!comp) {
                        read1 = read2 = 1;

====== CUT ========



-- 
=== Grigoriy Strokin, Lomonosov University (MGU), Moscow ===
=== contact info: http://isabase.philol.msu.ru/~grg/     ===


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19991021195649.A36122>