From owner-freebsd-fs@FreeBSD.ORG Mon Nov 12 17:24:10 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AA4E314C for ; Mon, 12 Nov 2012 17:24:10 +0000 (UTC) (envelope-from jas@cse.yorku.ca) Received: from bronze.cs.yorku.ca (bronze.cs.yorku.ca [130.63.95.34]) by mx1.freebsd.org (Postfix) with ESMTP id 7324F8FC08 for ; Mon, 12 Nov 2012 17:24:10 +0000 (UTC) Received: from [130.63.97.125] (ident=jas) by bronze.cs.yorku.ca with esmtp (Exim 4.76) (envelope-from ) id 1TXxk0-0003KM-34 for freebsd-fs@freebsd.org; Mon, 12 Nov 2012 12:24:08 -0500 Message-ID: <50A130B7.4080604@cse.yorku.ca> Date: Mon, 12 Nov 2012 12:24:07 -0500 From: Jason Keltz User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:10.0.9) Gecko/20121011 Thunderbird/10.0.9 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: RHEL to FreeBSD file server Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Spam-Level: - X-Spam-Report: Content preview: For the last few months, I've been working on and off learning about FreeBSD. The goal of my work is to swap out our current dual Red Hat Enterprise Linux file servers with FreeBSD. I'm ultimately hoping for the most reliable, high performance NFS file server that I can get. The fact that, in addition, I get to take advantage of ZFS is what I see as a a major bonus. [...] Content analysis details: (-1.0 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 SHORTCIRCUIT Not all rules were run, due to a shortcircuited rule -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 17:24:10 -0000 For the last few months, I've been working on and off learning about FreeBSD. The goal of my work is to swap out our current dual Red Hat Enterprise Linux file servers with FreeBSD. I'm ultimately hoping for the most reliable, high performance NFS file server that I can get. The fact that, in addition, I get to take advantage of ZFS is what I see as a a major bonus. I only recently (thanks Rick!) became aware of this mailing list, and after reading a few months worth of postings, I'm a little "nervous" about stability of ZFS in FreeBSD, though I can see that many issues are tied to specific combinations of FreeBSD versions, driver versions, specific HBAs, etc. In addition, I'm a little bit concerned reading about perceived performance issues with the NFS implementation, yet, only recently, there was a new NFS implementation in FreeBSD. That being said, I've learned that in general, people don't often go posting about experiences that work well, so I'm trying to stay positive, and hoping that my plan is still for the best. I'm hoping to share some information about what I've done with the old file servers, and what I intend to do with the new one, and get some feedback from list members to see if I'm heading in the right direction. Even if you can't comment on anything I'm about to write about, but you can tell me about a positive experience you have running FreeBSD as an NFS file server with ZFS, that would be great!! My present (2008) file servers both contain LSI/3ware RAID controller cards, and several RAID units with disks arranged in RAID10 configuration. There are a total of about 1600 mounts across both servers. Home directories are "split" between the servers, but only on two ext3 filesystems. We are using NFSv3 at the moment, and because I don't use Kerberos, I run NFS over OpenVPN, mostly to protect the connection (though we use cipher none for performance). For cost effectiveness, we have a "manual failover" solution. That is, either file server has enough disk slots to "take over" for the other server. If a server is taken down, I can take the disks out of either server, place them in the other server, turn it on, and through scripting, either server can take over the IP/name/disks from the other server, and all the NFS clients resume as if both servers are running. It's not ideal, but I'll tell you - it's cost effective! Fast forward a few years... I'm looking to replace the above hardware completely. In terms of hardware, I've recently been able to acquire a new 12th generation Dell PowerEdge R720 server with 64 GB of memory and dual E5-2660 processors (2.20 Ghz). It has an integrated Dell H310 controller (FreeBSD mfi driver) - which is presently only used for a mirrored root configuration (2 x 500 GB NL SAS drives). I added 2 x LSI 9205-8e cards (LSISAS2308) to the server. The LSI cards were flashed to the latest LSI firmware. I also have 1 Dell MD1220 array with 24 x 900 GB 10K SAS drives for data. The server has 4 x 1 GB Intel NICs. I'm working with FreeBSD 9.1RC3 because I understand that the 9.1 series includes many important improvements, and a totally new driver for the LSI SAS HBA cards. I suspect that by the time the file server is ready to go live, 9.1 will be officially released. In terms of ZFS, in my testing, I have been using a single ZFS pool comprised of 11 mirrored vdevs - a total of 22 disks, with 2 spares (24 disks total). As I understand it, I should be able to get the optimal performance this way. I considered using multiple pools, but with multiple pools comes multiple ZIL, L2ARC, etc and reduction in the performance numbers. I've been told that people have far bigger ZFS pools than my 22 disk zpool. As I understand it, as storage requirements increase, I could easily add another MD1220 with an additional 11 x mirrored vdev pairs and "append" this to the original pool, giving me lots more space with little hassle. At the moment, I have each LSI 9205-8e serving half of the disks in the single MD1220 chassis in a split configuration - that is, 12 disks on each LSI HBA card. It's a little overkill, I think, but the primary reason for buying the second LSI HBA card was to ensure that I had a spare card in the event that the first card ever failed. I figured that I might as well use it to improve performance rather than sitting it on the shelf collecting dust. Should I get funds to purchase an additional MD1220 (another 24 disks), I was thinking of configuring 1 x 9205-8e per MD1220, which I'm sure is also overkill. However, in theory, if both sides of the mirrored vdevs were placed in separate MD1220s, I would expect this to give me the ultimate in performance. In addition, should I lose one 9205-8e or one MD1220, I would expect that I would be able to "temporarily" continue in operation (while biting my nails without redundancy!!!). In addition, in my testing, I'm hoping to use NFSv4, which so far seems good. I have many, oh so many questions... 1) The new file server is relatively powerful. However, is one file server enough to handle a load of approximately 2000 connections? should I be looking at getting another server, or getting another server and another MD1220? How is 64 GB of memory when I'm talking about up to 2500-3000 ZFS filesystems on the box? I'm not using dedup and using minimal compression. 2) It is my intention to have 1 ZFS filesystem per user (so approx. 1800 right now)... Is this the way to go? It sure makes quotas easier! 3) I understand that I should be adding a SSD based ZIL. I don't have one right now. I've seen a lot of mixed information about what is the most cost effective solution that actually makes a difference. I'm wondering if someone could recommend a cost effective ZIL that works. It has to be 2.5" because all the disk slots in my configuration are 2.5". I believe someone recently recommended one of the newer Intel SSDs? As well, what size? (I understand that what complicates performance in any one brand of SSD is that the difference sizes perform differently)... Is there a problem if I put the ZIL in the file server head that is being managed by the mfi driver, even though it is ZIL for the disks managed by mps in the MD1220? 4) Under Linux, to be able to have a second server take over the disks from the first server with my "manual failover", I had to hard-code fsids on exports. Should I choose to do the same thing under FreeBSD, I'm told that the fsids on FreeBSD are generated based on a unique number for the file system type plus number generated by the file system -- but will this number remain the same for the filesystem if its exported from one system and imported into another ? 5) What would be the best recommended way of testing performance of the setup? I've done some really really basic testing using filebench.. local filebench fs: 42115: 77.169: IO Summary: 3139018 ops, 52261.125 ops/s, (4751/9502 r/w), 1265.8mb/s, 0us cpu/op, 3.3ms latency over NFS on a 100 mbps client: 27939: 182.854: IO Summary: 53254 ops, 887.492 ops/s, (81/162 r/w), 20.6mb/s, 876us cpu/op, 202.8ms latency over NFS on a 1 gigabit client: 4588: 84.732: IO Summary: 442488 ops, 7374.279 ops/s, (670/1341 r/w), 175.3mb/s, 491us cpu/op, 23.5ms latency ... I don't have the resources to write my own test suite, custom to our day to day operations, so I have to stick with one of the existing solutions. What would the best way to do this? Would simply connecting to the NFS server from several hundred clients, and running filebench be an "optimal" solution? Anyway, my apologies for the length of this e-mail. I've tried to "shorten" this as much as I could. I have so many questions! :) I'm hoping for any feedback that you might be able to provide, even if it's just one comment or two. Thanks for taking the time to read! Jason Keltz