From owner-freebsd-performance@FreeBSD.ORG Fri Dec 21 20:17:53 2007 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E74C16A419 for ; Fri, 21 Dec 2007 20:17:53 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 3A9A013C45B for ; Fri, 21 Dec 2007 20:17:53 +0000 (UTC) (envelope-from bright@elvis.mu.org) Received: by elvis.mu.org (Postfix, from userid 1192) id 8864D1A4D80; Fri, 21 Dec 2007 12:16:25 -0800 (PST) Date: Fri, 21 Dec 2007 12:16:25 -0800 From: Alfred Perlstein To: Alexandre Biancalana Message-ID: <20071221201625.GZ16982@elvis.mu.org> References: <8e10486b0712191109n3d21b02cyf5183ee0cd01d8ce@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8e10486b0712191109n3d21b02cyf5183ee0cd01d8ce@mail.gmail.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-performance@freebsd.org Subject: Re: Bad performance when accessing a lot of small files X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Dec 2007 20:17:53 -0000 * Alexandre Biancalana [071219 11:35] wrote: > Hi List, > > I have a backup server running FreeBSD 7-BETA3. The cpu is CPU: > Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz, 3GB Ram, 10x 500GB > SATA, Areca 1231-ML, the filesystem used to backup my other servers > locally is build on top of ARC-1231, 4TB (32k stripe) zfs filesystem > with gzip compression. > > This machine receive backups from ~30 servers, (of all kinds and > sizes, databases, fileservers, image servers, webservers, etc) all > night, write the last day in LTO-3 tapes and store some days older > days in disk. > > The behavior that I'm observing and that want your help is when the > system is accessing some directory with many small files ( directories > with ~ 1 million of ~30kb files), the performance is very poor. There is a lot of very good tuning advice in this thread, however one thing to note is that having ~1 million files in a directory is not a very good thing to do on just about any filesystem. One trick that a lot of people do is hashing the directories themselves so that you use some kind of computation to break this huge dir into multiple smaller dirs. If you can figure out a hashing algorithm, that may help you. For instance, if you tell sendmail to use "/var/spool/mq*" for its mail spool and you happen to have 256 directories under "/var/spool/" named "mq000" through "mq256" it will randomly pick a directory to dump a file in. This makes the performance a lot better. For one million files you can probably do a two level hash, you just have to figure out a good hashing algorithm. If you you can describe the data, I may be able to help you come up with a hashing algorithm for it. -Alfred