From owner-freebsd-hackers@FreeBSD.ORG Thu Sep 16 19:45:29 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4BD5816A4CE for ; Thu, 16 Sep 2004 19:45:29 +0000 (GMT) Received: from VARK.homeunix.com (SYDNEYPACIFIC-FOUR-EIGHTY-SIX.MIT.EDU [18.95.6.231]) by mx1.FreeBSD.org (Postfix) with ESMTP id EE51143D2D for ; Thu, 16 Sep 2004 19:45:28 +0000 (GMT) (envelope-from das@FreeBSD.ORG) Received: from VARK.homeunix.com (localhost [127.0.0.1]) by VARK.homeunix.com (8.13.1/8.12.10) with ESMTP id i8GJjQ8u003662; Thu, 16 Sep 2004 15:45:26 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.homeunix.com (8.13.1/8.12.10/Submit) id i8GJjQOH003661; Thu, 16 Sep 2004 15:45:26 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Date: Thu, 16 Sep 2004 15:45:26 -0400 From: David Schultz To: Frank Knobbe Message-ID: <20040916194526.GA3364@VARK.homeunix.com> Mail-Followup-To: Frank Knobbe , Bruce M Simpson , freebsd-hackers@FreeBSD.ORG References: <41483C97.2030303@fer.hr> <20040916151216.GB29643@SDF.LONESTAR.ORG> <20040916162030.GK1047@empiric.icir.org> <1095355201.530.14.camel@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1095355201.530.14.camel@localhost> cc: freebsd-hackers@FreeBSD.ORG Subject: Re: ZFS X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 19:45:29 -0000 On Thu, Sep 16, 2004, Frank Knobbe wrote: > On Thu, 2004-09-16 at 11:20, Bruce M Simpson wrote: > > On Thu, Sep 16, 2004 at 11:12:16AM -0400, Kevin A. Pieckiel wrote: > > > Where on earth would you find a disk system that can store 2^64 bytes of > > > data or larger, anyway? > > > > You can bet that somebody, somewhere, needs this right now. And someone > > will definitely need it in the next 5-10 years. > > Naahh... there is No Such Application for it. ;) Actually, there are a number of parties---banks, governments, geneticists, and Internet search engines, for instance---who never seem to have enough storage. I've seen lots of FUD and bad math on this thread, so let's do a quick back-of-the-envelope calculation. Hitachi and other storage vendors already ship systems with on the order of 1 petabyte (2^50B) of capacity. That's 14 doublings away from 2^64. Storage capacity has increased at 60% per year since 1991, so if history is any indicator[1], capacity will continue to double every 18 months. Ergo, 64-bit byte addresses won't be enough in 21 more years. (Other estimates are even shorter.) UFS is about two decades old, so Sun's design is at least plausible on technical grounds[2]. Moreover, the percentage of disk bandwidth that is typically dedicated to updating filesystem metadata is small, so the cost of the larger pointers is nominal. Note that I'm not arguing that 128-bit block numbers are the best choice; I'm merely trying to convince you that they are a sensible idea. Anyway, out of all the features of ZFS, support for 128-bit block numbers is among the least interesting, both from an engineering perspective and from the user's perspective. I don't know why everyone is so eager to discuss them. Much more interesting, for instance, is the pooled storage model for volume management. (Basically, you tell the system, ``I have a bunch of disks with similar QoS characteristics, and I want N filesystems on top of them.'' ZFS then dynamically shares the pool of storage among the filesystems. It's amazing how much trouble this saves.) [1] The rate of increase is not very predictable in the short term. It was pretty slow in the early 90's, then picked up with the introduction of GMR, and is now starting to slow down again. [2] Of course, you can buy yourself another decade or so by using 128-bit byte and sector addresses and 64-bit block addresses. UFS1 employs that strategy to squeeze as much as possible out of 32 bits, but the result isn't pretty. And for various reasons, that trick isn't as helpful for ZFS.