Date: Thu, 25 Oct 2007 13:17:50 +0200 (CEST) From: Jasper Jongmans <aprogas@hotmail.com> To: FreeBSD-gnats-submit@FreeBSD.org Cc: Jasper Jongmans <aprogas@hotmail.com> Subject: gnu/117481: sort(1) incorrect numeric sort in very specific cases Message-ID: <20071025111750.5631A6B@harry.aprogas.local> Resent-Message-ID: <200710251150.l9PBo0xO032321@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 117481 >Category: gnu >Synopsis: sort(1) incorrect numeric sort in very specific cases >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Oct 25 11:50:00 UTC 2007 >Closed-Date: >Last-Modified: >Originator: Jasper Jongmans <aprogas@hotmail.com> >Release: FreeBSD 6.2-RELEASE-p1 i386 >Organization: Church of Harkmannetjes >Environment: System: FreeBSD harry.aprogas.local 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #6: Thu Feb 22 12:38:44 CET 2007 root@harry.aprogas.net:/usr/obj/usr/src/sys/HARRY i386 sort (GNU coreutils) 5.3.0-20040812-FreeBSD Problem could not be reproduced on: FreeBSD 4.11-RELEASE-p11 i386 sort - GNU textutils 1.14 Debian GNU/Linux (exact version unknown) sort from coreutils 5.94 >Description: When sorting a file with comma seperated numeric values that are intended as individual integers rather than numbers with a decimal point, in specific cases sort(1) produces incorrect results. So far I've been able to narrow down the problem to the following: - the comma is used as field seperator - a numeric sort is attempted either using -k1n +0n or just -n - the field specified as sorting key is followed by another field containing numerics - this second field contains more digits than the same field on other lines - the sorting key and the numeric field following it do not have to be the only fields on the line, e.g. "foo,bar,2,14,bla" with -k3n will behave the same as "2,14" with -k1n - does not occur on all locales Let me reiterate that I am not trying to sort decimal fractions, but rather individual integers that happen to be seperated by commas. >How-To-Repeat: % cat sort.txt 2,14 3,5 1,321 8,12 1,9 % env LANG=en_US.UTF-8 sort -t, -k1n sort.txt 1,9 3,5 2,14 8,12 1,321 % env LANG=nl_NL.UTF-8 sort -t, -k1n sort.txt 1,321 1,9 2,14 3,5 8,12 % env LANG=C sort -t, -k1n sort.txt 1,321 1,9 2,14 3,5 8,12 >Fix: Workaround: set LANG=C or LC_ALL=C as recommended in the sort(1) manpage >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071025111750.5631A6B>