From owner-freebsd-fs@FreeBSD.ORG Fri Jun 20 15:41:15 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id BFB4682C for ; Fri, 20 Jun 2014 15:41:15 +0000 (UTC) Received: from mail-ob0-x231.google.com (mail-ob0-x231.google.com [IPv6:2607:f8b0:4003:c01::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8794A2E03 for ; Fri, 20 Jun 2014 15:41:15 +0000 (UTC) Received: by mail-ob0-f177.google.com with SMTP id uy5so1205550obc.8 for ; Fri, 20 Jun 2014 08:41:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=f45Uuj3HTlLk8QSC0dk1xKQ/1R+H1e35rhfpERtlsk0=; b=hGyEbp79rcOmjHqc/Vnkex+l9WIP8JbVckdq13c4oSM7WZpL8JYgMKyNTeGFXMQa94 6bIqpZDHFEwHsSqk03ODpTPB+xOFJITzJNbAm7qQAfaK+plfdQUpL/wzGiiSuAFfThLy f8h/IJMgcIf7Kekp7gnNVfOCumFVDWCbBzV8iwt3lfRsDciEeWmYbGbbVFQoRhdbCaZ1 IJfc64B4ZVRU7ATQyJa+fYVwr9bE9KBybbwiVbUv3npUUPsSFU6Rfb3+kahR3lQNkS+8 2NmIj3AG9i2cHdaHWatkAjuWo2yY8jiQSXcaifgMVwqFNx02cARu5nxK0L5E7Tp6j7ft V9rw== MIME-Version: 1.0 X-Received: by 10.182.60.65 with SMTP id f1mr3326751obr.78.1403278874288; Fri, 20 Jun 2014 08:41:14 -0700 (PDT) Received: by 10.202.171.1 with HTTP; Fri, 20 Jun 2014 08:41:14 -0700 (PDT) In-Reply-To: <53A44A23.6050604@physics.umn.edu> References: <1402846139.4722.352.camel@btw.pki2.com> <53A44A23.6050604@physics.umn.edu> Date: Fri, 20 Jun 2014 08:41:14 -0700 Message-ID: Subject: Re: Large ZFS arrays? From: Freddie Cash To: Graham Allan Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18 Cc: FreeBSD Filesystems X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Jun 2014 15:41:15 -0000 On Fri, Jun 20, 2014 at 7:50 AM, Graham Allan wrote= : > =E2=80=8B=E2=80=8B > > Would be interesting to hear a little about experiences with the drives > used... For our first "experimental" chassis we used 3TB Seagate desktop > drives - cheap but not the best choice, 18 months later they are dropping > like flies (luckily we can risk some cheapness here as most of our data c= an > be re-transferred from other sites if needed). Another chassis has 2TB WD > RE4 enterprise drives (no problems), and four others have 3TB and 4TB WD > "Red" NAS drives... which are another "slightly risky" selection but so f= ar > have been very solid (also in some casual discussion with a WD field > engineer he seemed to feel these would be fine for both ZFS and hadoop us= e). =E2=80=8BWe've had good experiences with WD Black drives (500 GB, 1 TB, and= 2 TB). These tend to last the longest, and provide the nicest failure modes. It's also very easy to understand the WD model numbers. We've also used Seagate =E2=80=8B7200.11 and 7200.12 drives (1 TB and 2 TB)= . These perform well, but fail in weird ways. They also tend to fail sooner than the WD. Thankfully, the RMA process with Seagate is fairly simple and turn-around time is fairly quick. Unfortunately, trying to figure out exactly which model of Seagate drive to order is becoming more of a royal pain as time goes on. They keep changing their marketing model names and the actual model numbers. There's now something like 8 separate product lines to pick from and 6+ different models in each line, times 2 for 4K vs 0.5K sectors. We started out (3? 4? years ago) using WD Blue drives because they were inexpensive (like almost half the price of WD Black) and figured all the ZFS goodness would work well on them. Quickly found out that desktop drives really aren't suited to server work. Especially when being written to for 12+ hours a day. :) =E2=80=8BWe were going to try some Toshiba drives in our next setup, but we received an exceptionally good price on WD Black drives on our last tender ($80 CDN for 1 TB) that we decided to stick with those for now.=E2=80=8B := D After all, they work well, so why rock the boat? =E2=80=8BWe haven't used any drives larger than 2 TB as of yet. Tracking drives for failures and replacements was a big issue for us. One > of my co-workers wrote a nice perl script which periodically harvests all > the data from the chassis (via sg3utils) and stores the mappings of chass= is > slots, da devices, drive labels, etc into a database. It also understands > the layout of the 847 chassis and labels the drives for us according to > some rules we made up - we do some prefix for the pool name, then "f" or > "b" for front/back of chassis, then the slot number, and finally (?) has > some controls to turn the chassis drive identify lights on or off. There > might be other ways to do all this but we didn't find any, so it's been > incredibly useful for us. > =E2=80=8BWe partition each drive into a single GPT partition (starting at 1= MB, covering whole disk), and label that partition with the chassis/slot that it's in. Then use the GPT label to build the pool (/dev/gpt/diskname). That way, all the metadata in the pool, and any error messages from ZFS, tell us exactly which disk, in which chassis, in which slot, is having issues. No external database required. :)=E2=80=8B Currently using smartmontools and the periodic scripts to alert us of pending drive failures, and a custom cron job that checks the health of the pools for alerting us to actual drive failures. It's not pretty, but with only 4 large servers to monitor, it works for us. I'm hoping to eventually convert those scripts to Nagios plugins, and let our existing monitoring setup keep track of the ZFS pools as well. --=20 Freddie Cash fjwcash@gmail.com