Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 Jun 2002 20:29:42 +1000
From:      "Tim J. Robbins" <tjr@FreeBSD.ORG>
To:        "Andrey A. Chernov" <ache@nagual.pp.ru>
Cc:        cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG
Subject:   Re: cvs commit: src/usr.bin/uniq uniq.c
Message-ID:  <20020606202942.A45282@treetop.robbins.dropbear.id.au>
In-Reply-To: <20020606100352.GA86621@nagual.pp.ru>; from ache@nagual.pp.ru on Thu, Jun 06, 2002 at 02:03:54PM %2B0400
References:  <200206060313.g563DAi26751@freefall.freebsd.org> <20020606031545.GA83612@nagual.pp.ru> <20020606161843.A44561@treetop.robbins.dropbear.id.au> <20020606083246.GA85860@nagual.pp.ru> <20020606192402.A45186@treetop.robbins.dropbear.id.au> <20020606100352.GA86621@nagual.pp.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jun 06, 2002 at 02:03:54PM +0400, Andrey A. Chernov wrote:

> 3) There is no much sense to discuss non-localized implementations you mention.

The GNU, Solaris and NetBSD implementations are localised, but do not use
strcoll() because it would be incorrect to do so.

> 4) Uniq must be consistent with other utilities 'unique' concept to
> operate in the flow, like comm, join and sort, they _use_ collate, so uniq
> must not produce different conflicting results.
> 
> 5) From common sense: in some languages
> <ss>alala
> and
> ssalala
> are the same.

strcoll() should not indicate that these strings are identical. If it does,
it is incorrectly implemented. FreeBSD's strcoll() and strxfrm() are
incorrectly implemented: strcoll("ss", "\xdf") == 0 in some locales on FreeBSD,
but equals 1, -1 or -108 on all Solaris locales.

strcmp() is the correct function to use to compare text strings for equality.
strcoll() is the correct function to use to compare sorting order of text
strings.

uniq is not interested in the sort order of strings, it is interested in
whether two lines of text are identical. If the sort utility is operating
correctly, identical input lines will be adjacement in the output.

$ export LANG=de_DE.ISO8859-15
$ printf "ss\n\337\nss\n\337\nss\n" | sort -u
ss
ß
$ printf "ss\n\337\nss\n\337\nss\n" | sort -u | uniq
ss

This behaviour is simply not correct, and the bug lies in FreeBSD's old
uniq implementation, not GNU sort.

I shall not back this change out.


Tim

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020606202942.A45282>