Date: Thu, 21 Oct 1999 19:56:51 +0400 From: Grigoriy Strokin <grg@philol.msu.ru> To: freebsd-hackers@freebsd.org Cc: ache@freebsd.org, freebsd-bugs@freebsd.org Subject: comm doesn't obey current locale collation Message-ID: <19991021195649.A36122@isabase.philol.msu.ru>
next in thread | raw e-mail | index | archive | help
Hello, 6 months ago I have sent a 'send-pr' about /usr/bin/comm (Problem Report bin/11221). Still there are no follow-ups, no has been this report assigned to any responsible person. What might this mean? ---------------- forward --------------- Problem Report bin/11221 comm doesn't obey current locale collation Confidential no Severity serious Priority medium Responsible freebsd-bugs@freebsd.org State open Class sw-bug Submitter-Id current-users Arrival-Date Mon Apr 19 10:40:03 PDT 1999 Last-Modified never Originator Grigoriy Strokin grg@philol.msu.ru Release FreeBSD 3.1-STABLE i386 Organization Moscow University Environment $LANG set to ru_RU.KOI8-R Description Comm produces wrong results when processing 8-bit text files sorted with /usr/bin/sort according to current locale (ru_RU.KOI8-R) How-To-Repeat Unpack the following shar-archive and call LANG=ru_RU.KOI8-R comm jaa.srt jaa2.srt Several identical characters will appear in both first and second column, whereas this must not occur with these files that were produced as output of LANG=ru_RU.KOI8-R sort ---------------------CUT------------------------------------------ # This is a shell archive. Save it in a file, remove anything before # this line, and then unpack it by entering "sh file". Note, it may # create directories; files and directories will be owned by you and # have default permissions. # # This archive contains: # # jaa.srt # jaa2.srt # echo x - jaa.srt sed 's/^X//' >jaa.srt << 'END-of-jaa.srt' Xô Xõ Xæ Xö Xé Xç Xà Xù Xü Xñ Xý Xû Xø Xá Xó END-of-jaa.srt echo x - jaa2.srt sed 's/^X//' >jaa2.srt << 'END-of-jaa2.srt' Xô Xõ Xæ Xè Xö Xé Xç Xà Xù Xü Xý Xø Xá Xó END-of-jaa2.srt exit Fix Apply the patch: --- comm.c.orig Mon Apr 19 16:57:56 1999 +++ comm.c Mon Apr 19 19:45:49 1999 @@ -55,9 +55,29 @@ #include <stdlib.h> #include <string.h> #include <unistd.h> +#include <locale.h> +#include <ctype.h> #define MAXLINELEN (LINE_MAX + 1) +/* The standard library strcoll, an analog of strcmp that takes into account + * the current locale, but strcasecmp does not have such an analog. + * So let's define a replacement, locale_dependent_strcasecmp + * */ + +int locale_dependent_strcasecmp(const char *s1, const char *s2) +{ + char a1[MAXLINELEN], a2[MAXLINELEN]; + char *c; + for (c = a1; *s1; c++, s1++) + *c = toupper((unsigned char)(*s1)); + *c = 0; + for (c = a2; *s2; c++, s2++) + *c = toupper((unsigned char)(*s2)); + *c = 0; + return strcoll(a1, a2); +} + char *tabs[] = { "", "\t", "\t\t" }; FILE *file __P((char *)); @@ -74,7 +94,7 @@ FILE *fp1, *fp2; char *col1, *col2, *col3; char **p, line1[MAXLINELEN], line2[MAXLINELEN]; - + setlocale(LC_ALL, ""); flag1 = flag2 = flag3 = 1; iflag = 0; @@ -139,9 +159,9 @@ /* lines are the same */ if(iflag) - comp = strcasecmp(line1, line2); + comp = locale_dependent_strcasecmp(line1, line2); else - comp = strcmp(line1, line2); + comp = strcoll(line1, line2); if (!comp) { read1 = read2 = 1; ====== CUT ======== -- === Grigoriy Strokin, Lomonosov University (MGU), Moscow === === contact info: http://isabase.philol.msu.ru/~grg/ === To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19991021195649.A36122>