From owner-freebsd-hackers@FreeBSD.ORG Mon Feb 4 15:07:13 2008 Return-Path: Delivered-To: hackers@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D4AAD16A420 for ; Mon, 4 Feb 2008 15:07:13 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (VARK.MIT.EDU [18.95.3.179]) by mx1.freebsd.org (Postfix) with ESMTP id 949E713C478 for ; Mon, 4 Feb 2008 15:07:13 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (localhost [127.0.0.1]) by VARK.MIT.EDU (8.14.2/8.14.1) with ESMTP id m14EtYtC040591; Mon, 4 Feb 2008 09:55:34 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.MIT.EDU (8.14.2/8.14.1/Submit) id m14EtYVs040590; Mon, 4 Feb 2008 09:55:34 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Mon, 4 Feb 2008 09:55:34 -0500 From: David Schultz To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= Message-ID: <20080204145534.GA40490@VARK.MIT.EDU> Mail-Followup-To: Dag-Erling =?iso-8859-1?Q?Sm=F8rgrav?= , Ed Schouten , hackers@FreeBSD.ORG References: <8663x6mc2o.fsf@ds4.des.no> <20080203131322.GK1179@hoeg.nl> <20080203151550.GA67020@owl.midgard.homeip.net> <86prvekqs2.fsf@ds4.des.no> <86lk62kqeh.fsf@ds4.des.no> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <86lk62kqeh.fsf@ds4.des.no> Cc: hackers@FreeBSD.ORG, Ed Schouten Subject: Re: sort(1) memory usage X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Feb 2008 15:07:13 -0000 On Sun, Feb 03, 2008, Dag-Erling Smørgrav wrote: > Dag-Erling Smørgrav writes: > > Erik Trulsson writes: > > > Yep, it seems that GNU sort allocates a quite large buffer by default when > > > the size of the input is unknown (such as when it reads input from stdin.) > > > A quick check in the source code indicates that it tries to size this buffer > > > according to how much memory the system has (and according to any limits set > > > on how much memory the process is allowed to use.) > > Uh, OK. This scaling doesn't seem to work correctly. It seems to > > allocate 27 MB on 32-bit machines and 54 MB on 64-bit machines, > > regardless of memory size. > > Looking at the code, it seems to go to extreme lengths to get it > absolutely wrong. For instance, if hw.physmem / 8 > hw.usermem, it will > pick the former, which means it's pretty much guaranteed to either fail > or hose your system (or both). > > In the immortal words of Blazing Star: YOU FAIL IT > > Count this as a vote for ditching GNU sort in favor of a BSD-licensed > implementation (from {Net,Open}BSD for instance). We had been using a BSD-licensed sort(1), but ache@ changed it back to GNU sort several years ago. Anyone know why? If I had to guess I'd say i18n, but that's not very hard to deal with these days given strcoll(3). That said, I'm unaware of any technical differences between the two.