Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Nov 2012 12:24:07 -0500
From:      Jason Keltz <jas@cse.yorku.ca>
To:        freebsd-fs@freebsd.org
Subject:   RHEL to FreeBSD file server
Message-ID:  <50A130B7.4080604@cse.yorku.ca>

next in thread | raw e-mail | index | archive | help
For the last few months, I've been working on and off learning about 
FreeBSD.  The goal of my work is to swap out our current dual Red Hat 
Enterprise Linux file servers with FreeBSD.  I'm ultimately hoping for 
the most reliable, high performance NFS file server that I can get.  The 
fact that, in addition, I get to take advantage of ZFS is what I see as 
a a major bonus.

I only recently (thanks Rick!) became aware of this mailing list, and 
after reading a few months worth of postings, I'm a little "nervous" 
about stability of ZFS in FreeBSD, though I can see that many issues are 
tied to specific combinations of FreeBSD versions, driver versions, 
specific HBAs, etc.  In addition, I'm a little bit concerned reading 
about perceived performance issues with the NFS implementation, yet, 
only recently, there was a new NFS implementation in FreeBSD.   That 
being said, I've learned that in general, people don't often go posting 
about experiences that work well, so I'm trying to stay positive, and 
hoping that my plan is still for the best.  I'm hoping to share some 
information about what I've done with the old file servers, and what I 
intend to do with the new one, and get some feedback from list members 
to see if I'm heading in the right direction.   Even if you can't 
comment on anything I'm about to write about, but you can tell me about 
a positive experience you have running FreeBSD as an NFS file server 
with ZFS, that would be great!!

My present (2008) file servers both contain LSI/3ware RAID controller 
cards, and several RAID units with disks arranged in RAID10 
configuration.  There are a total of about 1600 mounts across both 
servers.   Home directories are "split" between the servers, but only on 
two ext3 filesystems.  We are using NFSv3 at the moment, and because I 
don't use Kerberos, I run NFS over OpenVPN, mostly to protect the 
connection (though we use cipher none for performance).  For cost 
effectiveness, we have a "manual failover" solution.  That is,  either 
file server has enough disk slots to "take over" for the other 
server.    If a server is taken down, I can take the disks out of either 
server, place them in the other server, turn it on, and through 
scripting, either server can take over the IP/name/disks from the other 
server, and all the NFS clients resume as if both servers are running.  
It's not ideal, but I'll tell you - it's cost effective!

Fast forward a few years...

I'm looking to replace the above hardware completely.  In terms of 
hardware, I've recently been able to acquire a new 12th generation Dell 
PowerEdge R720 server with 64 GB of memory and dual E5-2660 processors 
(2.20 Ghz).  It has an integrated Dell H310 controller (FreeBSD mfi 
driver) - which is presently only used for a mirrored root configuration 
(2 x 500 GB NL SAS drives).  I added 2 x LSI 9205-8e cards (LSISAS2308) 
to the server.  The LSI cards were flashed to the latest LSI firmware.  
I also have 1 Dell MD1220 array with 24 x 900 GB 10K SAS drives for 
data.   The server has 4 x 1 GB Intel NICs.

I'm working with FreeBSD 9.1RC3 because I understand that the 9.1 series 
includes many important improvements, and a totally new driver for the 
LSI SAS HBA cards. I suspect that by the time the file server is ready 
to go live, 9.1 will be officially released.

In terms of ZFS, in my testing, I have been using a single ZFS pool 
comprised of 11 mirrored vdevs - a total of 22 disks, with 2 spares (24 
disks total). As I understand it, I should be able to get the optimal 
performance this way.  I considered using multiple pools, but with 
multiple pools comes multiple ZIL, L2ARC, etc and reduction in the 
performance numbers.   I've been told that people have far bigger ZFS 
pools than my 22 disk zpool.  As I understand it, as storage 
requirements increase, I could easily add another MD1220 with an 
additional 11 x mirrored vdev pairs and "append" this to the original 
pool, giving me lots more space with little hassle.

At the moment, I have each LSI 9205-8e serving half of the disks in the 
single MD1220 chassis in a split configuration - that is, 12 disks on 
each LSI HBA card.  It's a little overkill, I think, but the primary 
reason for buying the second LSI HBA card was to ensure that I had a 
spare card in the event that the first card ever failed.  I figured that 
I might as well use it to improve performance rather than sitting it on 
the shelf collecting dust.  Should I get funds to purchase an additional 
MD1220 (another 24 disks), I was thinking of configuring 1 x 9205-8e per 
MD1220, which I'm sure is also overkill.   However, in theory, if both 
sides of the mirrored vdevs were placed in separate MD1220s,  I would 
expect this to give me the ultimate in performance.  In addition, should 
I lose one 9205-8e or one MD1220, I would expect that I would be able to 
"temporarily" continue in operation (while biting my nails without 
redundancy!!!).

In addition, in my testing, I'm hoping to use NFSv4, which so far seems 
good.

I have many, oh so many questions...

1) The new file server is relatively powerful.  However, is one file 
server enough to handle a load of approximately 2000 connections?  
should I be looking at getting another server,  or getting another 
server and another MD1220?  How is 64 GB of memory when I'm talking 
about up to 2500-3000 ZFS filesystems on the box? I'm not using dedup 
and using minimal compression.
2) It is my intention to have 1 ZFS filesystem per user (so approx. 1800 
right now)... Is this the way to go? It sure makes quotas easier!
3) I understand that I should be adding a SSD based ZIL.  I don't have 
one right now.  I've seen a lot of mixed information about what is the 
most cost effective solution that actually makes a difference.  I'm 
wondering if someone could recommend a cost effective ZIL that works.  
It has to be 2.5" because all the disk slots in my configuration are 
2.5".   I believe someone recently recommended one of the newer Intel 
SSDs?    As well, what size?  (I understand that what complicates 
performance in any one brand of SSD is that the difference sizes perform 
differently)...  Is there a problem if I put the ZIL in the file server 
head that is being managed by the mfi driver, even though it is ZIL for 
the disks managed by mps in the MD1220?
4) Under Linux, to be able to have a second server take over the disks 
from the first server with my "manual failover", I had to hard-code 
fsids on exports.  Should I choose to do the same thing under FreeBSD, 
I'm told that the fsids on FreeBSD are generated based on a unique 
number for the file system type plus number generated by the file system 
-- but will this number remain the same for the filesystem if its 
exported from one system and imported into another ?
5) What would be the best recommended way of testing performance of the 
setup? I've done some really really basic testing using filebench..

local filebench fs:
42115: 77.169: IO Summary: 3139018 ops, 52261.125 ops/s, (4751/9502 
r/w), 1265.8mb/s, 0us cpu/op, 3.3ms latency
over NFS on a 100 mbps client:
27939: 182.854: IO Summary: 53254 ops, 887.492 ops/s, (81/162 r/w), 
20.6mb/s, 876us cpu/op, 202.8ms latency
over NFS on a 1 gigabit client:
4588: 84.732: IO Summary: 442488 ops, 7374.279 ops/s, (670/1341 r/w), 
175.3mb/s, 491us cpu/op, 23.5ms latency

... I don't have the resources to write my own test suite, custom to our 
day to day operations, so I have to stick with one of the existing 
solutions.  What would the best way to do this? Would simply connecting 
to the NFS server from several hundred clients, and running filebench be 
an "optimal" solution?

Anyway, my apologies for the length of this e-mail.  I've tried to 
"shorten" this as much as I could.  I have so many questions! :)    I'm 
hoping for any feedback that you might be able to provide, even if it's 
just one comment or two.  Thanks for taking the time to read!

Jason Keltz




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50A130B7.4080604>