From owner-freebsd-fs@FreeBSD.ORG Tue May 29 07:59:52 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F13C106566C for ; Tue, 29 May 2012 07:59:52 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (ns1.bitblocks.com [173.228.5.8]) by mx1.freebsd.org (Postfix) with ESMTP id 501D38FC08 for ; Tue, 29 May 2012 07:59:52 +0000 (UTC) Received: from bitblocks.com (localhost [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id C4ADFB82A; Tue, 29 May 2012 00:59:46 -0700 (PDT) To: Bruce Evans In-reply-to: Your message of "Tue, 29 May 2012 17:35:18 +1000." <20120529161802.N975@besplex.bde.org> References: <1490568508.7110.1338224468089.JavaMail.root@zimbra.interconnessioni.it> <4FC457F7.9000800@FreeBSD.org> <20120529161802.N975@besplex.bde.org> Comments: In-reply-to Bruce Evans message dated "Tue, 29 May 2012 17:35:18 +1000." Date: Tue, 29 May 2012 00:59:46 -0700 From: Bakul Shah Message-Id: <20120529075946.C4ADFB82A@mail.bitblocks.com> Cc: freebsd-fs@FreeBSD.org Subject: Re: Millions of small files: best filesystem / best options X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 May 2012 07:59:52 -0000 On Tue, 29 May 2012 17:35:18 +1000 Bruce Evans wrote: > > But I expect using a file system would be so slow for lots of really > small files that I wouldn't try it. Caching is already poor for > 4K-files, and a factor of 20 loss won't improve it. If you don't want > to use a database, maybe you can use tar.[gz] files. These at least > reduce the wastage (but still waste about twice as much as msdosfs with > 512 byte blocks), unless they are compressed. I think there are ways > to treat tar files as file systems and to avoid reading the whole file > to find files in it (zip format is better for this). As someone else pointed out, the right thing for Alessio may be to just use fusefs-sqlfs or may be even roll his own! Metadata can be generated on the fly. If performance is an issue he can slurp in the whole file and use write-through for any updates. A million 200 bytes files would take less than 512MB. Another alternative: 9pfuse (from plan9ports). There is even an sqfs written in 339 lines of python on github that'd bolt right on 9pfuse! He can use it as a template to build exactly what he wants. There is also tarfs etc. in plan9ports but it provides readonly support.