Date: Wed, 8 Jun 2005 12:06:32 -0400 From: Charles Swiger <cswiger@mac.com> To: Colin Percival <cperciva@freebsd.org> Cc: freebsd-fs@freebsd.org, Randy Bush <randy@psg.com>, FreeBSD Current <freebsd-current@freebsd.org>, Robert Watson <rwatson@freebsd.org>, Giorgos Keramidas <keramida@freebsd.org>, Eric Anderson <anderson@centtech.com>, Julian Elischer <julian@elischer.org> Subject: Re: you are in an fs with millions of small files Message-ID: <AFC1C71E-D755-4C9F-90C6-ECBF01C45D17@mac.com> In-Reply-To: <42A6A3B1.4090607@freebsd.org> References: <17059.7150.269428.448187@roam.psg.com> <42A4D5D0.9040500@elischer.org> <42A59367.6060307@centtech.com> <20050607175242.D61131@fledge.watson.org> <86ll5lmhs3.fsf@xps.des.no> <20050608074613.GA979@orion.daedalusnetworks.priv> <42A6A3B1.4090607@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Jun 8, 2005, at 3:52 AM, Colin Percival wrote: > Giorgos Keramidas wrote: >> On 2005-06-08 09:25, Dag-Erling Sm?rgrav <des@des.no> wrote: >>> That's because fts's sorting code is brain-dead. It starts by >>> reading >>> the entire directory into a linked list, then copies that list >>> into an >>> array which it passes to qsort(), and finally converts the array >>> back >>> into a linked list. >> >> Is there a better way to sort a linked list > > How do you define "better"? Most people think "better" means "it runs faster". :-) > You can merge-sort a singly-linked list quite easily, but converting > it to an array and back would probably be faster. Ugh. Converting a list to an array and back simply to sort requires roughly twice the working memory, and the work you're doing to accomplish this conversion is time spent not actually solving anything useful. It's best to pick one data structure based on what the task requires and what the access patterns are. If you know that the number of elements is fixed and will never change, arrays are fine, but if the # of elements changes dynamicly, something like a list or red-black tree is a better bet. A list is more suitable for linear traversal, whereas a RB-tree handles random access better. qsort() is wonderful, but as the size of individual data elements grows, the overhead of copying them around in the array becomes large enough that you are better off rearranging list pointers rather than using an array representation. A long time ago, the break-even point for using lists rather than arrays was somewhere around sizeof(el) > 64, but it's been a while.... -- -Chuck
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AFC1C71E-D755-4C9F-90C6-ECBF01C45D17>