Date: Mon, 19 Apr 1999 21:38:27 +0400 (MSD) From: grg@philol.msu.ru To: FreeBSD-gnats-submit@freebsd.org Subject: bin/11221: comm doesn't obey current locale collation Message-ID: <199904191738.VAA06606@isabase.philol.msu.ru>
next in thread | raw e-mail | index | archive | help
>Number: 11221 >Category: bin >Synopsis: comm doesn't obey current locale collation >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Apr 19 10:40:03 PDT 1999 >Closed-Date: >Last-Modified: >Originator: Grigoriy Strokin >Release: FreeBSD 3.1-STABLE i386 >Organization: Moscow University >Environment: $LANG set to ru_RU.KOI8-R >Description: Comm produces wrong results when processing 8-bit text files sorted with /usr/bin/sort according to current locale (ru_RU.KOI8-R) >How-To-Repeat: Unpack the following shar-archive and call LANG=ru_RU.KOI8-R comm jaa.srt jaa2.srt Several identical characters will appear in both first and second column, whereas this must not occur with these files that were produced as output of LANG=ru_RU.KOI8-R sort # This is a shell archive. Save it in a file, remove anything before # this line, and then unpack it by entering "sh file". Note, it may # create directories; files and directories will be owned by you and # have default permissions. # # This archive contains: # # jaa.srt # jaa2.srt # echo x - jaa.srt sed 's/^X//' >jaa.srt << 'END-of-jaa.srt' XÒ XÓ XÔ XÆ XÈ XÃ XÞ XÛ XÝ Xß XÙ XØ XÜ XÀ XÑ END-of-jaa.srt echo x - jaa2.srt sed 's/^X//' >jaa2.srt << 'END-of-jaa2.srt' XÒ XÓ XÔ XÕ XÆ XÈ XÃ XÞ XÛ XÝ XÙ XÜ XÀ XÑ END-of-jaa2.srt exit >Fix: Apply the patch: --- comm.c.orig Mon Apr 19 16:57:56 1999 +++ comm.c Mon Apr 19 19:45:49 1999 @@ -55,9 +55,29 @@ #include <stdlib.h> #include <string.h> #include <unistd.h> +#include <locale.h> +#include <ctype.h> #define MAXLINELEN (LINE_MAX + 1) +/* The standard library strcoll, an analog of strcmp that takes into account + * the current locale, but strcasecmp does not have such an analog. + * So let's define a replacement, locale_dependent_strcasecmp + * */ + +int locale_dependent_strcasecmp(const char *s1, const char *s2) +{ + char a1[MAXLINELEN], a2[MAXLINELEN]; + char *c; + for (c = a1; *s1; c++, s1++) + *c = toupper((unsigned char)(*s1)); + *c = 0; + for (c = a2; *s2; c++, s2++) + *c = toupper((unsigned char)(*s2)); + *c = 0; + return strcoll(a1, a2); +} + char *tabs[] = { "", "\t", "\t\t" }; FILE *file __P((char *)); @@ -74,7 +94,7 @@ FILE *fp1, *fp2; char *col1, *col2, *col3; char **p, line1[MAXLINELEN], line2[MAXLINELEN]; - + setlocale(LC_ALL, ""); flag1 = flag2 = flag3 = 1; iflag = 0; @@ -139,9 +159,9 @@ /* lines are the same */ if(iflag) - comp = strcasecmp(line1, line2); + comp = locale_dependent_strcasecmp(line1, line2); else - comp = strcmp(line1, line2); + comp = strcoll(line1, line2); if (!comp) { read1 = read2 = 1; ====== CUT ======== >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199904191738.VAA06606>