From owner-freebsd-stable@FreeBSD.ORG Sat Sep 27 20:22:53 2008 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6BC61065698 for ; Sat, 27 Sep 2008 20:22:53 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from QMTA05.westchester.pa.mail.comcast.net (qmta05.westchester.pa.mail.comcast.net [76.96.62.48]) by mx1.freebsd.org (Postfix) with ESMTP id 5A59F8FC0A for ; Sat, 27 Sep 2008 20:22:52 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from OMTA09.westchester.pa.mail.comcast.net ([76.96.62.20]) by QMTA05.westchester.pa.mail.comcast.net with comcast id KpaB1a00A0SCNGk55wNs7X; Sat, 27 Sep 2008 20:22:52 +0000 Received: from koitsu.dyndns.org ([67.180.253.227]) by OMTA09.westchester.pa.mail.comcast.net with comcast id KwNq1a00F4v8bD73VwNrf8; Sat, 27 Sep 2008 20:22:52 +0000 X-Authority-Analysis: v=1.0 c=1 a=T7u0UfHZMGYA:10 a=6gU17RteXIwA:10 a=QycZ5dHgAAAA:8 a=wfoEADXWh1VhxFxo5dAA:9 a=_rjbnX32jb-BWE60tPkA:7 a=6-6gS2jo6fUNmTzoeezKdfBfMqoA:4 a=EoioJ0NPDVgA:10 a=LY0hPdMaydYA:10 Received: by icarus.home.lan (Postfix, from userid 1000) id 7A61AC9432; Sat, 27 Sep 2008 13:22:50 -0700 (PDT) Date: Sat, 27 Sep 2008 13:22:50 -0700 From: Jeremy Chadwick To: Charles Sprickman Message-ID: <20080927202250.GA60980@icarus.home.lan> References: <20080921213426.GA13923@0lsen.net> <20080921215203.GC9494@icarus.home.lan> <20080921215930.GA25826@0lsen.net> <20080921220720.GA9847@icarus.home.lan> <249873145.20080926213341@takeda.tk> <20080927051413.GA42700@icarus.home.lan> <765067435.20080926223557@takeda.tk> <20080927064417.GA43638@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-stable@FreeBSD.org Subject: Re: Recommendations for servers running SATA drives X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Sep 2008 20:22:53 -0000 On Sat, Sep 27, 2008 at 03:16:11PM -0400, Charles Sprickman wrote: > On Fri, 26 Sep 2008, Jeremy Chadwick wrote: >> Let's be realistic. We're talking about ATA and SATA hard disks, hooked >> up to on-board controllers -- these are the majority of users. Those >> with ATA/SATA RAID controllers (not on-board RAID either; most/all of >> those do not let you disable drive write caching) *might* have a RAID >> BIOS menu item for disabling said feature. > > While I would love to deploy every server with SAS, that's not practical > in many cases, especially for light-duty servers that are not being > pushed very hard. I am taking my chances with multiple affordable drives > and gmirror where I cannot throw in a 3Ware card. I imagine that many > non-desktop FreeBSD users are doing the same considering you can fetch a > decent 1U box with plenty of storage for not much more than $1K. I > assume many here are in agreement on this point -- just making it clear > that the bargain crowd is not some weird edge case in the userbase... I'm in full agreement here. As much as I love SCSI (and I sincerely do) it's (IMHO unjustifiably) overpriced, simply because "it can be". You'd expect the price of SCSI to decrease over the years, but it hasn't; it's become part of a niche market, primarily intended for large businesses with cash to blow. As I said, I love SCSI, the protocol is excellent, and it's very well-supported all over the place -- and though I have no personal experience with SAS, it appears to be equally as excellent, yet the price is comparative to SCSI. Even at my place of work we use SATA disks in our filers. I suppose this is justified in the sense that a disk failure there will be less painful than it would be in a single or dual-disk server, so saving money is legitimate since RAID-5 (or whatever) is in use. But with regards to our server boxes, either single or dual SATA disks are now being used, rather than SCSI. I haven't asked our datacenter and engineering folks why we've switched, but gut feeling says "saving money" >> Regardless of all of this, end-users should, in no way shape or form, >> be expected to go to great lengths to disable their disk's write cache. >> They will not, I can assure you. Thus, we must assume: write caching >> on a disk will be enabled, period. If a filesystem is engineered with >> that fact ignored, then the filesystem is either 1) worthless, or 2) >> serves a very niche purpose and should not be the default filesystem. > > Arguments about defaults aside, this is my first questions. If I've got > a server with multiple SATA drives mirrored with gmirror, is turning on > write-caching a good idea? What kind of performance impact should I > expect? What is the relationship between caching, soft-updates, and > either NCQ or TCQ? > > Here's an example of a Seagate, trimmed for brevity: > > Protocol Serial ATA v1.0 > device model ST3160811AS > > Feature Support Enable Value Vendor > write cache yes yes > read ahead yes yes > Native Command Queuing (NCQ) yes - 31/0x1F > Tagged Command Queuing (TCQ) no no 31/0x1F > > TCQ is clearly not supported, NCQ seems to be supported, but I don't know > how to tell if it's actually enabled or not. Write-caching is currently > on. Actually, no -- FreeBSD ata(4) does not support NCQ. I believe there are some unofficial patches (or even a PR) floating around which are for testing, but out of the box, it lacks support. The hyphen you see under the Enable column is supposed to signify that (I feel it's badly placed; it should say "notsupp" or "unsupp" or something like that. Hyphen is too vague). The NCQ support patches might require AHCI as well, I forget. It's been a while. > The tradeoff is apparently performance vs. more reliable recovery should > the machine lose power, smoke itself, etc., but all I've seen is > anecdotal evidence of how bad performance gets. > > FWIW, this machine in particular had it's mainboard go up in smoke last > week. One drive was too far gone for gmirror to rebuild it without doing > a "forget" and "insert". The remaining drive was too screwy for > background fsck, but a manual check in single-user left me with no real > suprises or problems. As long as the array rebuilt fine, I believe small quirks are acceptable. Scenarios where the array *doesn't* rebuild properly when a new disk is added are of great concern (and in the case of some features such as Intel MatrixRAID, the FreeBSD bugs are so severe that you are liable to lose data in such scenarios. MatrixRAID != gmirror, of course). This also leads me a little off-topic -- when it comes to disk replacements, administrators want to be able to do this without taking the system down. There are problems with this, but it often depends greatly on hardware and BIOS configuration. I've successfully done a hot-swap (hardware: SATA hot-swap backplane, AHCI in use, SATA2 disks), but it required me to issue "atacontrol detach" first (I am very curious to know what would've happened had I just yanked the disk). Upon inserting the new disk, one has to be *very* careful about the order of atacontrol commands given -- there are cases where "attach" will cause the system to panic or SATA bus to lock up, but it seems to depend upon what commands were executed previously (such as "reinit"). Sorry if this is off-topic, but I wanted to mention it. >> The system is already up and the filesystems mounted. If the error in >> question is of such severity that it would impact a user's ability to >> reliably use the filesystem, how do you expect constant screaming on >> the console will help? A user won't know what it means; there is >> already evidence of this happening (re: mysterious ATA DMA errors which >> still cannot be figured out[6]). >> >> IMHO, a dirty filesystem should not be mounted until it's been fully >> analysed/scanned by fsck. So again, people are putting faith into >> UFS2+SU despite actual evidence proving that it doesn't handle all >> scenarios. > > I'll ask, but it seems like the consensus here is that background fsck, > while the default, is best left disabled. The cases where it might make > sense are: > > -desktop systems > -servers that have incredibly huge filesystems (and even there being able > to selectively background fsck filesystems might be helpful) > > The first example is obvious, people want a fast-booting desktop. The > second is trading long fsck times in single-user for some uncertainty. The first item I agree with, and I believe the benefits there easily outweigh the risks/quirks. The 2nd item I can go either way on; for example, my home BSD box has 4x500GB disks in it (and about 1/3rd is used/filled). If that box crashes, I *most definitely* want data integrity preserved as best as possible. Of course, I'm using ZFS + raidz1 there, so maybe I'm arguing to hear myself talk -- but at one time, I wasn't using ZFS. I suppose it ultimately depends on what the administrator wants; I don't think we'll find a default that will please everyone, and I accept that reality. >> Filesystems have to be reliable; data integrity is focus #1, and cannot >> be sacrificed. Users and administrators *expect* a filesystem to be >> reliable. No one is going to keep using a filesystem if it has >> disadvantages which can result in data loss or "waste of administrative >> time" (which I believe is what's occurring here). > > The softupdates question seems tied quite closely to the write-caching > question. If write-caching "breaks" SU, that makes things tricky. So > another big question: > > If write-caching is enabled, should SU be disabled? This is an excellent question, one I too have been pondering. If the answer is "yes", then there's two options (pick one): a) Change the defaults during sysinstall; do NOT enable SU on all non-root filesystems, b) Set hw.ata.wc=0 during the installation startup, and upon a completed FreeBSD installation, set hw.ata.wc=0 in sysctl.conf (because the user sure won't know or remember to do this). (b) has risks involved, such as the scenario where someone has two or more disks, and only one disk is dedicated to FreeBSD; hw.ata.wc=0 disables write caching for **all** disks, so they'd possibly see degraded performance on the non-OS disks once mounted. If the answer is "no", then I guess we're fine. > And again, what kind of performance and/or reliability sacrifices are > being made? > > I'd love to hear some input from both admins dealing with this stuff in > production and from any developers who are making decisions about the > future direction of all of this. As would I. Good questions, Charles! (As usual! :-) ) -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |