From owner-freebsd-questions@FreeBSD.ORG Wed Jun 10 19:06:32 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6A16010656C5 for ; Wed, 10 Jun 2009 19:06:32 +0000 (UTC) (envelope-from vogelke@hcst.com) Received: from beta.hcst.com (beta.hcst.com [192.52.183.241]) by mx1.freebsd.org (Postfix) with ESMTP id 2BA238FC25 for ; Wed, 10 Jun 2009 19:06:31 +0000 (UTC) (envelope-from vogelke@hcst.com) Received: from beta.hcst.com (localhost [127.0.0.1]) by beta.hcst.com (8.13.8/8.13.8/Debian-3) with ESMTP id n5AJ6VMB004855 for ; Wed, 10 Jun 2009 15:06:31 -0400 Received: (from vogelke@localhost) by beta.hcst.com (8.13.8/8.13.8/Submit) id n5AJ6V1s004854; Wed, 10 Jun 2009 15:06:31 -0400 Received: by kev.msw.wpafb.af.mil (Postfix, from userid 32768) id 6B89ABEDB; Wed, 10 Jun 2009 15:06:06 -0400 (EDT) To: freebsd-questions@freebsd.org In-reply-to: <200906090945.48548.kirk@strauser.com> (message from Kirk Strauser on Tue, 9 Jun 2009 09:45:48 -0500) Organization: Oasis Systems Inc. X-Disclaimer: I don't speak for the USAF or Oasis. X-GPG-ID: 1024D/711752A0 2006-06-27 Karl Vogel X-GPG-Fingerprint: 56EB 6DBF 4224 C953 F417 CC99 4C7C 7D46 7117 52A0 Message-Id: <20090610190606.6B89ABEDB@kev.msw.wpafb.af.mil> Date: Wed, 10 Jun 2009 15:06:06 -0400 (EDT) From: vogelke+unix@pobox.com (Karl Vogel) Subject: Re: Need a filesystem with "unlimited" inodes X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: vogelke+unix@pobox.com List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jun 2009 19:06:32 -0000 >> On Tue, 9 Jun 2009 03:10:46 am Matthew Seaman wrote: M> Or store your data in a RDBMS rather than in the filesystem. >> On Tue, 9 Jun 2009 09:45:48 -0500, Kirk Strauser said: K> Hear, hear. I'm hard pressed to imagine why you'd need 100M 1KB files. DBs are great when you have structured data, but semi-structured text (like email) makes for a very poor fit. To see why, have a look at http://www.memoryhole.net/~kyle/databaseemail.html If you really need to store 100 million smallish chunks of information, consider using zip. Create 256 folders named 00-ff: #!/bin/sh hex='0 1 2 3 4 5 6 7 8 9 a b c d e f' for x in $hex ; do for y in $hex ; do mkdir ${x}${y} done done exit 0 Use the hash of your choice to map the name of each chunk to one of 256 zipfiles under each directory. This gives you 64k zipfiles, and if you put 1500 or so chunks in each one, you're pretty close to 100 million. me% cat mkchunks #!/usr/bin/perl -w for $chunk (@ARGV) { $_ = chunk2file($chunk); $file = "$1/$2.zip" if m/(..)(..)/; print "$file $chunk\n"; } exit(0); sub chunk2file { my $str = shift; my ($byte, $sum); use integer; $sum = 0; foreach $byte (unpack("C*", $str)) { # SDBM hash $sum = $byte + 65587 * $sum; } $sum &= 0xffff; # keep lowest 16 bits no integer; return sprintf("%4.4x", $sum); } me% ./mkchunks freebsd solaris 16/f7.zip freebsd ca/1f.zip solaris You'll get a better distribution if you use a hash like Digest::SHA1. -- Karl Vogel I don't speak for the USAF or my company People like you are the reason people like me need medication. --bumper sticker