From owner-freebsd-fs@FreeBSD.ORG Sun Nov 25 19:56:19 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 56F4F401 for ; Sun, 25 Nov 2012 19:56:19 +0000 (UTC) (envelope-from trent@snakebite.org) Received: from exchange.liveoffice.com (exchla3.liveoffice.com [64.70.67.188]) by mx1.freebsd.org (Postfix) with ESMTP id 350198FC12 for ; Sun, 25 Nov 2012 19:56:18 +0000 (UTC) Received: from exhub13.exchhosting.com (192.168.11.122) by exhub06.exchhosting.com (192.168.11.102) with Microsoft SMTP Server (TLS) id 8.3.213.0; Sun, 25 Nov 2012 11:55:11 -0800 Received: from localhost (35.8.247.10) by exchange.liveoffice.com (192.168.11.122) with Microsoft SMTP Server id 8.3.213.0; Sun, 25 Nov 2012 11:55:10 -0800 Date: Sun, 25 Nov 2012 14:55:09 -0500 From: Trent Nelson To: Subject: Solved persistent stalls in 'zio->io_cv)' Message-ID: <20121125195509.GB52355@snakebite.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Nov 2012 19:56:19 -0000 [ For the archives... ] Over the course of about a month I noticed progressively worse performance on my main FreeBSD ZFS box. It's a simple box; 1U, root-on-zfs, 2TB WD EARS zpool mirror. Almost everything started stalling towards the end. A simple ls would take 15-20 seconds to return, with ^T always indicating the wait state was 'zio->io_cv)'. smartctl/smartmond had detected two offline uncorrectable sectors on one of the drives, but subsequent full scans revealed no problems. I vaguely recall reading an old mailing list post that mentioned zio->io_cv was essentially the zfs equivalent to biord, i.e. "the system is waiting for the disk". Coupled with knowledge that a zpool mirror will equally split disk load between physical devices regardless of underlying device performance (i.e. it doesn't grok "disk 1 takes 600% longer to return my reads() than disk 0"), so I decided to run `diskinfo -tv` on each disk, not being able to think of anything else to run that would be useful, diagnostically. The output was pretty telling (pasted in full at the end of this e-mail): ada1: Seek Times: Full stroke: 250 iter in 38.386551 sec = 153.546 msec Half stroke: 250 iter in 32.733690 sec = 130.935 msec ... Transfer Rates: outside: 102400 kbytes in 10.554800 sec = 9702 kbytes/sec middle: 102400 kbytes in 2.299292 sec = 44535 kbytes/sec ada0: Seek Times: Full stroke: 250 iter in 13.585292 sec = 54.341 msec Half stroke: 250 iter in 6.156059 sec = 24.624 msec ... Transfer Rates: outside: 102400 kbytes in 2.234584 sec = 45825 kbytes/sec middle: 102400 kbytes in 4.405934 sec = 23241 kbytes/sec So, I concluded (or rather, hoped) that the abysmal performance was due to ada1 responding exponentially slower to requests than its zpool mirror counterpart ada0. I didn't have a replacement available, but performance was so bad I decided to break the zpool mirror and try simply limping along on a single disk until a replacement arrived (drive is still under WD warranty, so sent it in for an RMA replacement). Performance has been fine ever since (been about a week now on the single disk). Just posting for the benefit of the archives. There are lots of hits for 'zio->io_cv)', but nothing related to the particulars in this case. Trent. Full diskinfo output just prior to splitting the mirror: [root@hydrogen/ttypts/3(~)#] diskinfo -tv ada1 ada1 512 # sectorsize 2000398934016 # mediasize in bytes (1.8T) 3907029168 # mediasize in sectors 4096 # stripesize 0 # stripeoffset 3876021 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. WD-WMAZA5463028 # Disk ident. Seek times: Full stroke: 250 iter in 38.386551 sec = 153.546 msec Half stroke: 250 iter in 32.733690 sec = 130.935 msec Quarter stroke: 500 iter in 30.632886 sec = 61.266 msec Short forward: 400 iter in 53.067122 sec = 132.668 msec Short backward: 400 iter in 6.971919 sec = 17.430 msec Seq outer: 2048 iter in 0.332570 sec = 0.162 msec Seq inner: 2048 iter in 0.921388 sec = 0.450 msec Transfer rates: outside: 102400 kbytes in 10.554800 sec = 9702 kbytes/sec middle: 102400 kbytes in 2.299292 sec = 44535 kbytes/sec inside: 102400 kbytes in 7.301463 sec = 14025 kbytes/sec [root@hydrogen/ttypts/3(~)#] diskinfo -tv ada0 ada0 512 # sectorsize 2000398934016 # mediasize in bytes (1.8T) 3907029168 # mediasize in sectors 4096 # stripesize 0 # stripeoffset 3876021 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. WD-WMAZA5467587 # Disk ident. Seek times: Full stroke: 250 iter in 13.585292 sec = 54.341 msec Half stroke: 250 iter in 6.156059 sec = 24.624 msec Quarter stroke: 500 iter in 8.901942 sec = 17.804 msec Short forward: 400 iter in 5.421932 sec = 13.555 msec Short backward: 400 iter in 3.237963 sec = 8.095 msec Seq outer: 2048 iter in 0.636089 sec = 0.311 msec Seq inner: 2048 iter in 0.231709 sec = 0.113 msec Transfer rates: outside: 102400 kbytes in 2.234584 sec = 45825 kbytes/sec middle: 102400 kbytes in 4.405934 sec = 23241 kbytes/sec inside: 102400 kbytes in 3.050623 sec = 33567 kbytes/sec Interestingly enough, I just re-ran diskinfo again then on the remaining disk, and results are significantly better than those above: [root@hydrogen/ttypts/0(~)#] diskinfo -tv ada0 ada0 512 # sectorsize 2000398934016 # mediasize in bytes (1.8T) 3907029168 # mediasize in sectors 4096 # stripesize 0 # stripeoffset 3876021 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. WD-WMAZA5467587 # Disk ident. Seek times: Full stroke: 250 iter in 7.037776 sec = 28.151 msec Half stroke: 250 iter in 5.805128 sec = 23.221 msec Quarter stroke: 500 iter in 7.452755 sec = 14.906 msec Short forward: 400 iter in 3.966540 sec = 9.916 msec Short backward: 400 iter in 1.255183 sec = 3.138 msec Seq outer: 2048 iter in 0.245077 sec = 0.120 msec Seq inner: 2048 iter in 0.212632 sec = 0.104 msec Transfer rates: outside: 102400 kbytes in 1.286970 sec = 79567 kbytes/sec middle: 102400 kbytes in 1.062096 sec = 96413 kbytes/sec inside: 102400 kbytes in 1.996499 sec = 51290 kbytes/sec -- http://www.snakebite.net | @trentnelson