From owner-freebsd-current@FreeBSD.ORG Tue Jan 27 18:46:32 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D6A1C106566B for ; Tue, 27 Jan 2009 18:46:32 +0000 (UTC) (envelope-from ptice@aldridge.com) Received: from starbase.aldridge.com (starbase.aldridge.com [205.196.186.12]) by mx1.freebsd.org (Postfix) with ESMTP id 92B048FC14 for ; Tue, 27 Jan 2009 18:46:32 +0000 (UTC) (envelope-from ptice@aldridge.com) Received: from corporate.aldridge.com (corporate.aldridge.com [216.139.69.10]) by starbase.aldridge.com (8.14.3/8.14.2) with ESMTP id n0RIjXk2092323 for ; Tue, 27 Jan 2009 12:45:33 -0600 (CST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Date: Tue, 27 Jan 2009 12:41:20 -0600 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Help me select hardware....Some real world data that might help Thread-Index: Acl901Jj6YXzWgeRTR+JrojYu6LVEQC1V+BI References: <01N4NEOEB7LY00EQWX@tmk.com> From: "Paul Tice" To: "Terry Kennedy" , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: RE: Help me select hardware....Some real world data that might help X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jan 2009 18:46:33 -0000 Excuse my rambling, perhaps something in this mess will be useful. I'm currently using 8 cores (2x Xeon E5405), 16G FB-DIMM, and 8 x 750GB = drives on a backup system (I plan to add the other in the chassis one by = one, testing the speed along the way) 8-current AMD64, ZFS, Marvell 88sx6081 PCI-X card (8 port SATA) + = LSI1068E (8 port SAS/SATA) for the main Array, and the Intel onboard = SATA for boot drive(s). Data is sucked down through 3 gigabit ports, with another available but = not yet activated.=20 Array drives all live on the LSI right now. Drives are . ZFS is stable _IF_ you disable the prefetch and ZIL, otherwise the = classic ZFS wedge rears it's ugly head. I haven't had a chance to test = just one yet, but I'd guess it's the prefetch that's the quick killer. = Even with prefetching and ZIL disabled, my current bottleneck is the = GigE. I'm waiting to get new switches in that support jumbo frames, = quick and dirty testing shows almost 2x increase in throughput, and ~40% = drop in interrupt rates from the NICs compared to the current standard = (1500 MTU) frames. Pool was created with 'zpool create backup raidz da0 da1 da2 da3 da4 da5 = da6 da7' I've seen references to 8-Current having a kernel memory limit of 8G = (compared to 2G for pre 8 from what I understand so far) and ZFS ARC = (caching) is done in kernel memory space. (Please feel free to correct = me if I'm wrong on any of this!) Default ZFS (no disables) with a 1536M kern mem limit, and 512M ARC = limit, I saw 2085 ARC memory throttles before the box wedged. Using rsync over several machines with this setup, I'm getting a little = over 1GB/min to the disks.=20 'zpool iostat 60' is a wonderful tool.=20 I would mention something I've noticed that doesn't seem to be = documented: The first reading from 'zpool iostat' (whether single run or with an = interval) is a running average, although I haven't found the time period = averaged yet. (from pool mount time maybe?) The jumbo frame interrupt reduction may be important. I run 'netstat -i = -w60' right beside 'zpool iostat 60', and the throughput is closely = inversely related. I can predict a disk write (bursty writes in ZFS it = seems) by throughput dropping to on the NIC side. The drop is up to 75% = averaging around 50%. Using a 5 interval instead of 60, I see disk out = throughput spikes up to 90MB/s, although 55, 0, 0, 0, 55 is more common. = Possibly, binding interrupts to particular cpu's might help a bit too. I = haven't found, and don't feel competent to write, userspace tools to do = this. CPU usage during all this is suprisingly low. rsync is running with -z, = the files themselves are compressed as they go onto the drives with = pbzip2, and the whole thing runs on (ducking) BackupPC, which is all = perl script.=20 With all that, 16 machines backing up, and 1+GB/Min going to the = platters, CPU is still avg 40% idle using top. I'm considering remaking = the array raidz2, I seem to have enough CPU to handle it. Random ZFS thoughts: You cannot shrink/grow a raidz or raidz2. You can grow a stripe array, = I'm don't know if you can shrink it successfully. You cannot promote a stripe array to raidz/z2, nor demote in the other = direction. You can have hot spares, haven't seen a provision for warm/cold spares. /etc/default/rc.conf already has cron ZFS status/scrub checks, but not = enabled. Anyway, enough rambling, just thought I'd use something not too = incredibly far from your suggested system to toss some data out. Thanks Paul -----Original Message----- From: owner-freebsd-current@freebsd.org on behalf of Terry Kennedy Sent: Fri 1/23/2009 8:30 PM To: freebsd-current@freebsd.org Subject: Help me select hardware and software options for very large = server =20 [I posted the following message to freebsd-questions, as I thought it woule be the most appropriate list. As it has received no replies in two weeks, I'm trying freebsd-current.] -------- [I decided to ask this question here as it overlaps -hardware, = -current, and a couple other lists. I'd be glad to redirect the conversation to a list that's a better fit, if anyone would care to suggest one.] I'm in the process of planning the hardware and software for the = second generation of my RAIDzilla file servers (see = http://www.tmk.com/raidzilla for the current generation, in production for 4+ years). I expect that what I'm planning is probably "off the scale" in terms = of processing and storage capacity, and I'd like to find out and address = any issues before spending lots of money. Here's what I'm thinking of: o Chassis - CI Design SR316 (same model as current chassis, except i2c = link between RAID controller and front panel o Motherboard - Intel S5000PSLSATAR o CPU - 2x Intel Xeon E5450 BX80574E5450P p Remote management - Intel Remote Management Module 2 - AXXRM2 o Memory - 16GB - 8x Kingston KVR667D2D4F5/2GI o RAID controller - 3Ware 9650SE-16ML w/ BBU-MODULE-04 o Drives - 16x 2TB drives [not mentioning manufacturer yet] o Cables - 4x multi-lane SATA cables o DVD-ROM drive o Auxiliary slot fan next to BBU card o Adaptec AHA-39160 (for Quantum Superloader 3 tape drive) So much for the hardware. On the software front: o FreeBSD 8.x? o amd64 architecture o MBR+UFS2 for operating system partitions (hard partition in = controller) o GPT+ZFS for data partitions o Multiple 8TB data partitions (separate 8TB controller partitions or = one big partition divided with GPT?) I looked at "Large data storage in FreeBSD", but that seems to be a = stale page from 2005 or so: http://www.freebsd.org/projects/bigdisk/index.html I'm pretty sure I need ZFS, since even with the 2TB partitions I have = now, taking snapshots for dump or doing a fsck take approximately forever 8-) I'll be using the harware RAID 6 on the 3Ware controller, so I'd only be using ZFS to get filesystems larger than 2TB. I've been following the ZFS discussions on -current and -stable, and I think that while it isn't quite ready yet, it probably will be ready in a few months, being available around the same time I get this hardware asssembled. I recall reading that there will be an import of newer ZFS=20 code in the near future. Similarly, the ports collection seems to be moving along nicely with amd64 support. I think this system may have the most storage ever configured on a FreeBSD system, and it is probably up near the top in terms of CPU and memory. Once I have it assembled I'd be glad to let any FreeBSD devel- opers test and stress it if that would help improve FreeBSD on that type of configuration. In the meantime, any suggestions regarding the hardware or software con- figuration would be welcomed. Terry Kennedy http://www.tmk.com terry@tmk.com New York, NY USA _______________________________________________ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org"