Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Nov 2012 10:27:22 -0500
From:      Jason Keltz <jas@cse.yorku.ca>
To:        kpneal@pobox.com
Cc:        freebsd-fs@freebsd.org
Subject:   Re: RHEL to FreeBSD file server
Message-ID:  <50A266DA.2090605@cse.yorku.ca>
In-Reply-To: <20121113043409.GA70601@neutralgood.org>
References:  <50A130B7.4080604@cse.yorku.ca> <20121113043409.GA70601@neutralgood.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Thanks for your reply Kevin..

On 11/12/2012 11:34 PM, kpneal@pobox.com wrote:
> I'll see your long post and raise you one .... Nevermind. :)
:) :)

> On Mon, Nov 12, 2012 at 12:24:07PM -0500, Jason Keltz wrote:
>> For the last few months, I've been working on and off learning about
>> FreeBSD.  The goal of my work is to swap out our current dual Red Hat
>> Enterprise Linux file servers with FreeBSD.  I'm ultimately hoping for
>> the most reliable, high performance NFS file server that I can get.  The
>> fact that, in addition, I get to take advantage of ZFS is what I see as
>> a a major bonus.
> So, which is it? Do you want the most reliable server? OR, do you want
> the highest performance server? Be clear on what your requirements
> actually are.
Honestly, I don't really think that wanting a reliable server that isn't 
say, prone to crashing, losing filesystems out of the blue for 
unexplained reasons, etc. yet performing well is out of the question.   
Of course, who *doesn't* want a reliable server? :)  On the other hand, 
as you've suggested below, consideration on filesystem setup that might 
reduce the level of performance, yet provide *additional* reliability is 
something that should be considered nonetheless.

>> I'm looking to replace the above hardware completely.  In terms of
>> hardware, I've recently been able to acquire a new 12th generation Dell
>> PowerEdge R720 server with 64 GB of memory and dual E5-2660 processors
>> (2.20 Ghz).  It has an integrated Dell H310 controller (FreeBSD mfi
>> driver) - which is presently only used for a mirrored root configuration
>> (2 x 500 GB NL SAS drives).  I added 2 x LSI 9205-8e cards (LSISAS2308)
>> to the server.  The LSI cards were flashed to the latest LSI firmware.
>> I also have 1 Dell MD1220 array with 24 x 900 GB 10K SAS drives for
>> data.   The server has 4 x 1 GB Intel NICs.
>
>> I'm working with FreeBSD 9.1RC3 because I understand that the 9.1 series
>> includes many important improvements, and a totally new driver for the
>> LSI SAS HBA cards. I suspect that by the time the file server is ready
>> to go live, 9.1 will be officially released.
> Yep, 9.1 will be the first release with the driver you'll need for the
> R720's H310, the H810, and all the other 12G cards.
>   
Actually, I had the H310 working with 9.0 (which is what I started 
working with) by backporting the code. ;)
>> In terms of ZFS, in my testing, I have been using a single ZFS pool
>> comprised of 11 mirrored vdevs - a total of 22 disks, with 2 spares (24
>> disks total). As I understand it, I should be able to get the optimal
>> performance this way.  I considered using multiple pools, but with
> Optimal? Hard to say.
>
> On one of the OpenSolaris ZFS lists a guy did performance testing of various
> configurations. He found that for writes the best performance came from
> having vdevs consisting of a single disk. Performance scaled pretty well
> with the number of disks, but it had zero redundancy. For reads the best
> performance came from a single vdev of an N-way mirror.  Writes were a
> little worse than a single disk, but the up side was that it had near-linear
> scaling of read performance plus _excellent_ redundancy.
>
> Neither of those configurations is a very good idea in most cases.
>
> With your setup of 11 mirrors you have a good mixture of read and write
> performance, but you've compromised on the safety. The reason that RAID 6
> (and thus raidz2) and up were invented was because drives that get used
> together tend to fail together. If you lose a drive in a mirror there is
> an elevated probability that the replacement drive will not be in place
> before the remaining leg of the mirror fails. If that happens then you've
> lost the pool. (Drive failures are _not_ independent.)
>
> Consider instead having raidz2 vdevs of four drives each. This will give
> you the same amount of overhead (so the same amount of usable space), but
> the pool will be able to survive two failures in each group of four instead
> of only one in each group of two. Both read and write performance will be
> less than with your mirror pairs, but you'll still have striping across
> all of your vdevs. If performance isn't up to snuff you can then try adding
> more disks, controllers, etc.
I did experiment with a similar option at one point ... 4, 6 disk raidz2 
vdevs (as opposed to 6 x 4 disk vdev)..
Presently, with the mirrors, I'm using 22 disks with leaving 2 hot spares.
If I used 6 x 4 disk vdevs, I use all the disks and don't have hot 
spares (though I have better redundancy) -- I could put spares into the 
R720 head which actually has 14 disk slots empty, but I'm hesitant to 
put a spare for the md1220 using a different driver in the head of the 
R720...
If I use 5 x 4 disk vdevs, I lose a little space... probably a bit of 
performance as well ... but have better reliability, as you say...

Now, as I said, I didn't test the 6 x 4 disk vdev (yet), but I had tried 
4 x 6 disk vdevs which resulted in significantly less performance...

here's a basic filebench number from fs for 4 x 6 disk raidz2:

41941: 81.258: IO Summary: 1067034 ops, 17669.393 ops/s, (1606/3213 
r/w), 426.7mb/s, 0us cpu/op, 10.0ms latency

... compared to the mirrored vdevs:

42115: 77.169: IO Summary: 3139018 ops, 52261.125 ops/s, (4751/9502 
r/w), 1265.8mb/s, 0us cpu/op, 3.3ms latency

... and that's already splitting the disks across two controllers.... 
but I can understand how raidz2 would improve on the redundancy and 
provide a more reliable configuration.

> With groups of four you could if you had the dough have four shelves and
> four controllers. You could lose a controller or a shelf and still have
> redundancy left over in case of a drive failure on one of the remaining
> working shelves. One guy on the OpenSolaris list had an array of 6
> controllers/shelves with 6-drive raidz2 vdevs and one drive per vdev per
> shelf.
>
> Of course, I have no idea what your budget is for hardware or for useless
> employees due to a fileserver being down. You'll have to balance those
> risks and costs yourself.
Absolutely .... understood...

>> multiple pools comes multiple ZIL, L2ARC, etc and reduction in the
>> performance numbers.   I've been told that people have far bigger ZFS
>> pools than my 22 disk zpool.  As I understand it, as storage
>> requirements increase, I could easily add another MD1220 with an
>> additional 11 x mirrored vdev pairs and "append" this to the original
>> pool, giving me lots more space with little hassle.
> Yes, with ZFS you can add more drives and the new data will be striped
> across all the drives with a distribution that I'm unclear on. The old
> data will not be rebalanced. So if you wait until your pool is almost full,
> and then you add only a handful of drives, then your new drives may not
> give you the performance you require.
Yes.  I've read this.  I'm not sure why an "enterprise" filesystem like 
ZFS doesn't support the "rebalancing" of data across the entire volume 
automatically (without recopying all the data back on to itself).. I 
understand it's a costly operation, but if someone wants to do it, the 
option should be there... (zpool rebalance)..

>> 1) The new file server is relatively powerful.  However, is one file
>> server enough to handle a load of approximately 2000 connections?
>> should I be looking at getting another server,  or getting another
>> server and another MD1220?  How is 64 GB of memory when I'm talking
>> about up to 2500-3000 ZFS filesystems on the box? I'm not using dedup
>> and using minimal compression.
> Sizing machines is _hard_. It's a fair bet that your new machine is faster
> than one of your old machines. But whether or not it can handle the
> consolidated load is a question I can't answer. It depends on how much
> load a typical (for you) client puts on the server.
>
Understood..

Thanks for your feedback!  It's much appreciated.

Jason Keltz



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50A266DA.2090605>