Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Oct 2007 13:17:50 +0200 (CEST)
From:      Jasper Jongmans <aprogas@hotmail.com>
To:        FreeBSD-gnats-submit@FreeBSD.org
Cc:        Jasper Jongmans <aprogas@hotmail.com>
Subject:   gnu/117481: sort(1) incorrect numeric sort in very specific cases
Message-ID:  <20071025111750.5631A6B@harry.aprogas.local>
Resent-Message-ID: <200710251150.l9PBo0xO032321@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         117481
>Category:       gnu
>Synopsis:       sort(1) incorrect numeric sort in very specific cases
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Oct 25 11:50:00 UTC 2007
>Closed-Date:
>Last-Modified:
>Originator:     Jasper Jongmans <aprogas@hotmail.com>
>Release:        FreeBSD 6.2-RELEASE-p1 i386
>Organization:
Church of Harkmannetjes
>Environment:
System: FreeBSD harry.aprogas.local 6.2-RELEASE-p1 FreeBSD 6.2-RELEASE-p1 #6: Thu Feb 22 12:38:44 CET 2007 root@harry.aprogas.net:/usr/obj/usr/src/sys/HARRY i386
sort (GNU coreutils) 5.3.0-20040812-FreeBSD

Problem could not be reproduced on:
FreeBSD 4.11-RELEASE-p11 i386
sort - GNU textutils 1.14

Debian GNU/Linux (exact version unknown)
sort from coreutils 5.94
>Description:
When sorting a file with comma seperated numeric values that are
intended as individual integers rather than numbers with a decimal
point, in specific cases sort(1) produces incorrect results. So far I've
been able to narrow down the problem to the following:

- the comma is used as field seperator
- a numeric sort is attempted either using -k1n +0n or just -n
- the field specified as sorting key is followed by another field
  containing numerics
- this second field contains more digits than the same field on other
  lines
- the sorting key and the numeric field following it do not have to be
  the only fields on the line, e.g. "foo,bar,2,14,bla" with -k3n will
  behave the same as "2,14" with -k1n
- does not occur on all locales

Let me reiterate that I am not trying to sort decimal fractions, but
rather individual integers that happen to be seperated by commas.
>How-To-Repeat:
% cat sort.txt
2,14
3,5
1,321
8,12
1,9

% env LANG=en_US.UTF-8 sort -t, -k1n sort.txt
1,9
3,5
2,14
8,12
1,321

% env LANG=nl_NL.UTF-8 sort -t, -k1n sort.txt
1,321
1,9
2,14
3,5
8,12

% env LANG=C sort -t, -k1n sort.txt
1,321
1,9
2,14
3,5
8,12
>Fix:
Workaround: set LANG=C or LC_ALL=C as recommended in the sort(1) manpage
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071025111750.5631A6B>