Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 19 Apr 1999 21:38:27 +0400 (MSD)
From:      grg@philol.msu.ru
To:        FreeBSD-gnats-submit@freebsd.org
Subject:   bin/11221: comm doesn't obey current locale collation
Message-ID:  <199904191738.VAA06606@isabase.philol.msu.ru>

next in thread | raw e-mail | index | archive | help

>Number:         11221
>Category:       bin
>Synopsis:       comm doesn't obey current locale collation
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Apr 19 10:40:03 PDT 1999
>Closed-Date:
>Last-Modified:
>Originator:     Grigoriy Strokin
>Release:        FreeBSD 3.1-STABLE i386
>Organization:
Moscow University
>Environment:

$LANG set to ru_RU.KOI8-R

>Description:

Comm produces wrong results when processing 8-bit text files
sorted with /usr/bin/sort according to current locale (ru_RU.KOI8-R)

>How-To-Repeat:

Unpack the following shar-archive and call
  LANG=ru_RU.KOI8-R comm jaa.srt jaa2.srt
Several identical characters will appear
in both first and second column,
whereas this must not occur with
these files that were produced
as output of 
   LANG=ru_RU.KOI8-R sort


# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	jaa.srt
#	jaa2.srt
#
echo x - jaa.srt
sed 's/^X//' >jaa.srt << 'END-of-jaa.srt'
XÒ
XÓ
XÔ
XÆ
XÈ
XÃ
XÞ
XÛ
XÝ
Xß
XÙ
XØ
XÜ
XÀ
XÑ
END-of-jaa.srt
echo x - jaa2.srt
sed 's/^X//' >jaa2.srt << 'END-of-jaa2.srt'
XÒ
XÓ
XÔ
XÕ
XÆ
XÈ
XÃ
XÞ
XÛ
XÝ
XÙ
XÜ
XÀ
XÑ
END-of-jaa2.srt
exit


>Fix:
	
Apply the patch:


--- comm.c.orig	Mon Apr 19 16:57:56 1999
+++ comm.c	Mon Apr 19 19:45:49 1999
@@ -55,9 +55,29 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
+#include <locale.h>
+#include <ctype.h>
 
 #define	MAXLINELEN	(LINE_MAX + 1)
 
+/* The standard library strcoll, an analog of strcmp that takes into account
+ * the current locale, but strcasecmp does not have such an analog.
+ * So let's define a replacement, locale_dependent_strcasecmp 
+ * */
+
+int locale_dependent_strcasecmp(const char *s1, const char *s2)
+{
+  char a1[MAXLINELEN], a2[MAXLINELEN];
+  char *c;
+  for (c = a1; *s1; c++, s1++)
+    *c = toupper((unsigned char)(*s1));
+  *c = 0;
+  for (c = a2; *s2; c++, s2++)
+    *c = toupper((unsigned char)(*s2));
+  *c = 0;
+  return strcoll(a1, a2);
+}
+
 char *tabs[] = { "", "\t", "\t\t" };
 
 FILE   *file __P((char *));
@@ -74,7 +94,7 @@
 	FILE *fp1, *fp2;
 	char *col1, *col2, *col3;
 	char **p, line1[MAXLINELEN], line2[MAXLINELEN];
-
+  setlocale(LC_ALL, "");
 	flag1 = flag2 = flag3 = 1;
 	iflag = 0;
 
@@ -139,9 +159,9 @@
 
 		/* lines are the same */
 		if(iflag)
-			comp = strcasecmp(line1, line2);
+			comp = locale_dependent_strcasecmp(line1, line2);
 		else
-			comp = strcmp(line1, line2);
+			comp = strcoll(line1, line2);
 
 		if (!comp) {
 			read1 = read2 = 1;

====== CUT ========


>Release-Note:
>Audit-Trail:
>Unformatted:


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904191738.VAA06606>