From owner-freebsd-fs@FreeBSD.ORG Mon Feb 6 18:04:42 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16276106566C for ; Mon, 6 Feb 2012 18:04:42 +0000 (UTC) (envelope-from peter@pean.org) Received: from smtprelay-b11.telenor.se (smtprelay-b11.telenor.se [62.127.194.20]) by mx1.freebsd.org (Postfix) with ESMTP id 974D98FC1D for ; Mon, 6 Feb 2012 18:04:41 +0000 (UTC) Received: from ipb4.telenor.se (ipb4.telenor.se [195.54.127.167]) by smtprelay-b11.telenor.se (Postfix) with ESMTP id 299CBCD8A for ; Mon, 6 Feb 2012 18:44:01 +0100 (CET) X-SENDER-IP: [85.225.5.217] X-LISTENER: [smtp.bredband.net] X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjdIAMIQME9V4QXZPGdsb2JhbABCr0cZAQEBATcygXIBAQQBViMFCwsOChwSIQwMChQGEwkSh2EDsFmIFYMUBAYDAg0CBwcEBgE0BoMKBRgCCwIFgReCVmMElSiLCodS X-IronPort-AV: E=Sophos;i="4.73,372,1325458800"; d="scan'208";a="1804415825" Received: from c-d905e155.166-7-64736c14.cust.bredbandsbolaget.se (HELO pi.lan) ([85.225.5.217]) by ipb4.telenor.se with ESMTP; 06 Feb 2012 18:44:00 +0100 Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=iso-8859-1 From: =?iso-8859-1?Q?Peter_Ankerst=E5l?= In-Reply-To: Date: Mon, 6 Feb 2012 18:43:59 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <7321E3F2-35E2-43F4-9932-EC55F5AA9D0B@pean.org> References: <4F2FF72B.6000509@pean.org> <20120206162206.GA541@icarus.home.lan> To: Freddie Cash X-Mailer: Apple Mail (2.1257) Cc: freebsd-fs@freebsd.org Subject: Re: HPC and zfs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Feb 2012 18:04:42 -0000 -- Peter Ankerst=E5l peter@pean.org http://www.pean.org/ On 6 feb 2012, at 17:41, Freddie Cash wrote: > On Mon, Feb 6, 2012 at 8:22 AM, Jeremy Chadwick > wrote: >> On Mon, Feb 06, 2012 at 04:52:11PM +0100, Peter Ankerst?l wrote: >>> I want to investigate if it is possible to create your own usable >>> HPC storage using zfs and some >>> network filesystem like nfs. >>>=20 >>> Just a thought experiment.. >>> A machine with 2 6 core XEON, 3.46Ghz 12MB and 192GB of ram (or = more) >>> I addition the machine will use 3-6 SSD drives for ZIL and 3-6 SSD >>> deives for cache. >>> Preferrably in mirror where applicable. >>>=20 >>> Connected to this machine we will have about 410 3TB drives to give = approx >>> 1PB of usable storage in a 8+2 raidz configuration. >>>=20 >>> Connected to this will be a ~800 nodes big HPC cluster that will >>> access the storage in parallell >>> is this even possible or do we need to distribute the meta data load >>> over many servers? If that is the case, >>> does it exist any software for FreeBSD that could accomplish this >>> distribution (pNFS dosent seem to be >>> anywhere close to usable in FreeBSD) or do I need to call NetApp or >>> Panasas right away? It would be >>> really nice if I could build my own storage solution. >>>=20 >>> Other possible solutions to this problem is extremley welcome. >>=20 >> For starters I'd love to know: >>=20 >> - What single motherboard supports up to 192GB of RAM >=20 > SuperMicro H8DGi-F supports 256 GB of RAM using 16 GB modules (16 RAM > slots). It's an AMD board, but there should be variants that support > Intel CPUs. It's not uncommon to support 256 GB of RAM these days, > although 128 GB boards are much more common. Yeah, the one I was looking at was SuperMicro X8DTU-F, but yeah, the = more money RAM the better. >=20 >> - How you plan on getting roughly 410 hard disks (or 422 assuming >> an additional 12 SSDs) hooked up to a single machine >=20 > In a "head node" + "JBOD" setup? Where the head node has a mobo that > supports multiple PCIe x8 and PCIe x16 slots, and is stuffed full of > 16-24 port multi-lane SAS/SATA controllers with external ports that > are cabled up to external JBOD boxes. The SSDs would be connected to > the mobo SAS/SATA ports. >=20 > Each JBOD box contains nothing but power, SAS/SATA backplane, and > harddrives. Possibly using SAS expanders. >=20 > We're considering doing the same for our SAN/NAS setup for > centralising storage for our VM hosts, although not quite to the same > scale as the OP. :) Yep, NetApp has disk-shelves that can be configured JBOD that fits 60 = drives into 4U. :D >=20 >> If you are considering investing the time and especially money (the = cost >> here is almost unfathomable, IMO) into this, I strongly recommend you >> consider an actual hardware filer (e.g. NetApp). Your performance = and >> reliability will be much greater, plus you will get overall better >> support from NetApp in the case something goes wrong. In the case = you >> run into problems with FreeBSD (and I can assure you in this kind of >> setup you will) with this kind of extensive setup, you will be at the >> mercy of developers' time/schedules with absolutely no guarantee that >> your problem will be solved. You definitely want a support contract. >> Thus, go NetApp. >=20 > For an HPC setup like the OP wants, where performance and uptime are > critical, I agree. You don't want to be skimping on the hardware and > software. >=20 A big consideration for us is also the installation. If we go with = something like NetApp they can install the system and we don't need to put in the extra = hours (probably a lot) the get the thing running. But being a huge fan of BSD = I wanted to at least look up the possibility to build our own system. > However, if you have the money for a NetApp setup like this ($ > 500,000+ US I'm guessing), then you also have the money to hire a > FreeBSD developer(s) to work on the parts of the system that are > critical to this (NFS, ZFS, CAM, drivers, scheduler, GEOM, etc). > Then, you could go with a white-box, custom build and have the support > in-house. >=20 > --=20 > Freddie Cash > fjwcash@gmail.com >=20