From owner-freebsd-stable@FreeBSD.ORG Tue Feb 14 19:52:57 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 986D21065673 for ; Tue, 14 Feb 2012 19:52:57 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta12.emeryville.ca.mail.comcast.net (qmta12.emeryville.ca.mail.comcast.net [76.96.27.227]) by mx1.freebsd.org (Postfix) with ESMTP id 77C448FC08 for ; Tue, 14 Feb 2012 19:52:57 +0000 (UTC) Received: from omta22.emeryville.ca.mail.comcast.net ([76.96.30.89]) by qmta12.emeryville.ca.mail.comcast.net with comcast id ZuwF1i0071vN32cACvsx51; Tue, 14 Feb 2012 19:52:57 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta22.emeryville.ca.mail.comcast.net with comcast id Zvsv1i00S1t3BNj8ivsvST; Tue, 14 Feb 2012 19:52:56 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 1DDCA102C1E; Tue, 14 Feb 2012 11:52:55 -0800 (PST) Date: Tue, 14 Feb 2012 11:52:55 -0800 From: Jeremy Chadwick To: Oscar Prieto Message-ID: <20120214195255.GA5064@icarus.home.lan> References: <20120214091909.GP2010@equilibrium.bsdes.net> <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <4F3A83DE.3000200@ambtec.de> <20120214165029.GA1852@icarus.home.lan> <4F3A971F.9040407@omnilan.de> <20120214192319.44ff7aff@zelda.sugioarto.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Harald Schmalzbauer , freebsd-stable@freebsd.org, Martin Sugioarto , Claudius Herder Subject: Re: problems with AHCI on FreeBSD 8.2 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Feb 2012 19:52:57 -0000 On Tue, Feb 14, 2012 at 08:31:23PM +0100, Oscar Prieto wrote: > I used to had tons of ahci errors in my 4 disk raidz1 worth of > HD154UIs when the rig was built a year ago or so (with 8.0 Release), > but they dissapeared after tuning ZFS. > > Sadly i also got a new timeout days ago followed with smartcl erros i > still keep unchecked but i guess they cold be legit, i still have to > test/swap cables and give it a try. About your ada3 disk: The below SMART errors indicate your disk does in fact have physical media problems -- 1 confirmed bad sector, and 5 which are "suspect". "Suspect" LBAs are unreadable until writes are issued to them. A write will induce the drive to re-analyse the sector at that LBA and determine if it's truly bad or not. A single LBA can actually take quite a long time to analyse (it depends on what the problem is), and may result in 30+ seconds of delay. You can either let the drive figure it out over normal usage patterns, or you can do it manually yourself time permitting. Your drive that shows read failures in the SMART self-test log gives you the LBA numbers; try reading from those LBAs first. I can explain this procedure in another thread/offline/whatever. (Does anyone read what I write, re: don't hijack the thread? :-) ) About all of your disks: All of your disks are undergoing regular/periodic SMART short and long tests. Please stop this; it really, truly does no good. You will experience performance hits during these tests. About timeouts: Timeouts seen on the controller and driver level can happen in this situation; this is universal. This is usually what features like Western Digital's TLER and Hitachi + Samsung's CCTL can help alleviate, but not fully solve. I think the ada(4) default timeout of 30 seconds is a decent value, to be quite honest, but I'm not sure what the AHCI driver timeout is. mav@ would need to clue me in, or I'd need to go look at the source. (Right now in my life is not a good time for me to be reviewing source code or looking at commits, sadly. Too much on my mind recently.) I can discuss the TLER/CCTL stuff more at length if needed, but to be blatantly honest, I would rather not and here's why: people begin to rely on these features to try and circumvent actual problems with their drives. Phrased differently: people on the Internet become incredibly focused on all of these timeout durations (TLER/CCTL vs. controller vs. driver vs. storage subsystem timeouts) and try to find some bizarre "perfect harmony" between them all. Instead, just leave them all alone and watch your drives for problems. Further details which pertain to Samsung drives: In your case, you run smartd(8), which periodically hits the drive with SMART requests, pulling attribute data down and parsing it. I believe your model is fine for this, but for similar Samsung models, I must strongly advise against this. There are well-documented problems with Samsung firmwares and SMART behaviour which can result in data loss (yes you read that right). Please see smartmontools' Wiki page on the matter for full details. Just make sure you're running a fixed firmware: http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks Regarding throughput of the drives being slow (30-40MBytes/sec across a gigE link): This sounds more like a Samba tuning problem, but ZFS raidz isn't known for "amazing speed" per se. Please see a post of mine from a while back on how to tune Samba, which many followed up to with appreciation stating their throughput increased dramatically: http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061642.html I should follow up to that post with the following entry, because I've since updated my own smb.conf to tune things a bit better, and include comments as to the justifications: # # The below options increase throughput substantially. Be aware # that AIO support requires the aio.ko kernel module loaded, # and Samba to be built with AIO enabled. Important notes: # # 1) We explicitly disable sendfile(2) because it has known # problems on ZFS, including resulting in 2x the amount of memory # used on the machine (VM cache + ZFS cache). For further details, # see freebsd-fs or freebsd-stable thread, subject "8.1-STABLE: # zfs and sendfile: problem still exists". # # 2) (2011/10/03) socket options SO_SNDBUF and SO_RCVBUF do not # appear to matter on FreeBSD, or our sysctls somehow take care of # this (or maybe AIO?). The performance is the same with or without # these two socket options on 8.2-STABLE. # # 3) (2011/10/03) My previously-mentioned "aio write behind" option # is incorrect; see the officia smb.conf(5) man page for the syntax. # It's not a yes/no toggleable, thus serves no purpose. # socket options = TCP_NODELAY use sendfile = no min receivefile size = 16384 aio read size = 16384 aio write size = 16384 The rest is in the thread I linked. Hope this helps. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |