From owner-freebsd-fs@FreeBSD.ORG Mon Feb 6 17:25:07 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 45B441065673 for ; Mon, 6 Feb 2012 17:25:07 +0000 (UTC) (envelope-from michael@fuckner.net) Received: from dedihh.fuckner.net (dedihh.fuckner.net [81.209.183.161]) by mx1.freebsd.org (Postfix) with ESMTP id BF0F78FC13 for ; Mon, 6 Feb 2012 17:25:06 +0000 (UTC) Received: from dedihh.fuckner.net (localhost [127.0.0.1]) by dedihh.fuckner.net (Postfix) with ESMTP id 8C9C8275C for ; Mon, 6 Feb 2012 18:25:05 +0100 (CET) X-Virus-Scanned: amavisd-new at fuckner.net Received: from dedihh.fuckner.net ([127.0.0.1]) by dedihh.fuckner.net (dedihh.fuckner.net [127.0.0.1]) (amavisd-new, port 10024) with SMTP id 7zU7cBaYGcFQ for ; Mon, 6 Feb 2012 18:24:58 +0100 (CET) Received: from c64.rebootking.de (e176135028.adsl.alicedsl.de [85.176.135.28]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by dedihh.fuckner.net (Postfix) with ESMTPSA id B0E51274A for ; Mon, 6 Feb 2012 18:24:58 +0100 (CET) Message-ID: <4F300CEA.5000901@fuckner.net> Date: Mon, 06 Feb 2012 18:24:58 +0100 From: Michael Fuckner User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20120131 Thunderbird/10.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4F2FF72B.6000509@pean.org> <20120206162206.GA541@icarus.home.lan> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: HPC and zfs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Feb 2012 17:25:07 -0000 On 02/06/2012 05:41 PM, Freddie Cash wrote: Hi all, > On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick > wrote: >> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote: >>> I want to investigate if it is possible to create your own usable >>> HPC storage using zfs and some network filesystem like nfs. especially HPS sounds interesting to me- but for HPC you typicially need fast r/w-access for all nodes in the cluster. That's why Lustre uses several storages for concurring access over a fast link (typicially Infiniband) Another thing to think about is CPU: you probably need weeks for a rebuild of a single disk in a Petabyte Filesystem- I haven't tried this with ZFS yet, but I'm really interested if anyone already did this. The whole setup sounds a little bit like the system shown by aberdeen: http://www.aberdeeninc.com/abcatg/petabyte-storage.htm schematics at tomshardware: http://www.tomshardware.de/fotoreportage/137-Aberdeen-petarack-petabyte-sas.html The Problem with Aberdeen is they don't use Zil/ L2Arc. >>> Just a thought experiment.. >>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or more) >>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD >>> deives for cache. >>> Preferrably in mirror where applicable. >>> >>> Connected to this machine we will have about 410 3TB drives to give approx >>> 1PB of usable storage in a 8+2 raidz configuration. I don't know what the situation is for the rest of the world, but 3TB currently is still hard to buy in Europe/ Germany. >>> Connected to this will be a ~800 nodes big HPC cluster that will >>> access the storage in parallell what is your typical load pattern? >>> is this even possible or do we need to distribute the meta data load >>> over many servers? It is a good idea to have >>> If that is the case, >>> does it exist any software for FreeBSD that could accomplish this >>> distribution (pNFS dosent seem to be >>> anywhere close to usable in FreeBSD) or do I need to call NetApp or >>> Panasas right away? not that I know of > SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM > slots). It's an AMD board, but there should be variants that support > Intel CPUs. It's not uncommon to support 256 GB of RAM these days, > although 128 GB boards are much more common. Currently Intel CPUs have 3 Memory Channels. If you have 2 Sockets, 2 Dimms per Channel, 3 Channels- 12 Dimms with cheap 16GB Modules is 192GB. 32GB are also available today ;-) >> - How you plan on getting roughly 410 hard disks (or 422 assuming >> an additional 12 SSDs) hooked up to a single machine > > In a "head node" + "JBOD" setup? Where the head node has a mobo that > supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of > 16-24 port multi-lane SAS/SATA controllers with external ports that > are cabled up to external JBOD boxes. The SSDs would be connected to > the mobo SAS/SATA ports. > > Each JBOD box contains nothing but power, SAS/SATA backplane, and > harddrives. Possibly using SAS expanders. If you use Supermicro I would use X8DTH-iF, some LSI HBA (9200-8e, 2x Multilane external) and some JBOD-Chassis (like SUpermicro 847E16-RJBOD1) Regards, Michael!