Date: Thu, 21 Oct 1999 19:56:51 +0400 From: Grigoriy Strokin <grg@philol.msu.ru> To: freebsd-hackers@freebsd.org Cc: ache@freebsd.org, freebsd-bugs@freebsd.org Subject: comm doesn't obey current locale collation Message-ID: <19991021195649.A36122@isabase.philol.msu.ru>
next in thread | raw e-mail | index | archive | help
Hello,
6 months ago I have sent a 'send-pr' about /usr/bin/comm (Problem Report
bin/11221).
Still there are no follow-ups, no has been this report assigned to any
responsible person.
What might this mean?
---------------- forward ---------------
Problem Report bin/11221
comm doesn't obey current locale collation
Confidential
no
Severity
serious
Priority
medium
Responsible
freebsd-bugs@freebsd.org
State
open
Class
sw-bug
Submitter-Id
current-users
Arrival-Date
Mon Apr 19 10:40:03 PDT 1999
Last-Modified
never
Originator
Grigoriy Strokin grg@philol.msu.ru
Release
FreeBSD 3.1-STABLE i386
Organization
Moscow University
Environment
$LANG set to ru_RU.KOI8-R
Description
Comm produces wrong results when processing 8-bit text files
sorted with /usr/bin/sort according to current locale (ru_RU.KOI8-R)
How-To-Repeat
Unpack the following shar-archive and call
LANG=ru_RU.KOI8-R comm jaa.srt jaa2.srt
Several identical characters will appear
in both first and second column,
whereas this must not occur with
these files that were produced
as output of
LANG=ru_RU.KOI8-R sort
---------------------CUT------------------------------------------
# This is a shell archive. Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file". Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
# jaa.srt
# jaa2.srt
#
echo x - jaa.srt
sed 's/^X//' >jaa.srt << 'END-of-jaa.srt'
Xô
Xõ
Xæ
Xö
Xé
Xç
Xà
Xù
Xü
Xñ
Xý
Xû
Xø
Xá
Xó
END-of-jaa.srt
echo x - jaa2.srt
sed 's/^X//' >jaa2.srt << 'END-of-jaa2.srt'
Xô
Xõ
Xæ
Xè
Xö
Xé
Xç
Xà
Xù
Xü
Xý
Xø
Xá
Xó
END-of-jaa2.srt
exit
Fix
Apply the patch:
--- comm.c.orig Mon Apr 19 16:57:56 1999
+++ comm.c Mon Apr 19 19:45:49 1999
@@ -55,9 +55,29 @@
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
+#include <locale.h>
+#include <ctype.h>
#define MAXLINELEN (LINE_MAX + 1)
+/* The standard library strcoll, an analog of strcmp that takes into account
+ * the current locale, but strcasecmp does not have such an analog.
+ * So let's define a replacement, locale_dependent_strcasecmp
+ * */
+
+int locale_dependent_strcasecmp(const char *s1, const char *s2)
+{
+ char a1[MAXLINELEN], a2[MAXLINELEN];
+ char *c;
+ for (c = a1; *s1; c++, s1++)
+ *c = toupper((unsigned char)(*s1));
+ *c = 0;
+ for (c = a2; *s2; c++, s2++)
+ *c = toupper((unsigned char)(*s2));
+ *c = 0;
+ return strcoll(a1, a2);
+}
+
char *tabs[] = { "", "\t", "\t\t" };
FILE *file __P((char *));
@@ -74,7 +94,7 @@
FILE *fp1, *fp2;
char *col1, *col2, *col3;
char **p, line1[MAXLINELEN], line2[MAXLINELEN];
-
+ setlocale(LC_ALL, "");
flag1 = flag2 = flag3 = 1;
iflag = 0;
@@ -139,9 +159,9 @@
/* lines are the same */
if(iflag)
- comp = strcasecmp(line1, line2);
+ comp = locale_dependent_strcasecmp(line1, line2);
else
- comp = strcmp(line1, line2);
+ comp = strcoll(line1, line2);
if (!comp) {
read1 = read2 = 1;
====== CUT ========
--
=== Grigoriy Strokin, Lomonosov University (MGU), Moscow ===
=== contact info: http://isabase.philol.msu.ru/~grg/ ===
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19991021195649.A36122>
