From owner-freebsd-current@FreeBSD.ORG Thu Dec 17 13:37:41 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6385A1065670 for ; Thu, 17 Dec 2009 13:37:41 +0000 (UTC) (envelope-from Benjamin.Close@clearchain.com) Received: from mail.clearchain.com (leo.clearchain.com [199.73.29.74]) by mx1.freebsd.org (Postfix) with ESMTP id 220948FC08 for ; Thu, 17 Dec 2009 13:37:40 +0000 (UTC) Received: from [192.168.155.20] (ppp121-45-107-111.lns20.adl6.internode.on.net [121.45.107.111]) (authenticated bits=0) by mail.clearchain.com (8.14.3/8.14.3) with ESMTP id nBHDbXpv042256 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 18 Dec 2009 00:07:37 +1030 (CST) (envelope-from Benjamin.Close@clearchain.com) Message-ID: <4B2A341C.5000802@clearchain.com> Date: Fri, 18 Dec 2009 00:07:32 +1030 From: Benjamin Close User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.7pre) Gecko/20091214 Lightning/1.0b1pre Shredder/3.0.1pre MIME-Version: 1.0 To: Alexander Zagrebin References: <39309F560B98453EBB9AEA0F29D9D80E@vosz.local> In-Reply-To: <39309F560B98453EBB9AEA0F29D9D80E@vosz.local> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (mail.clearchain.com [199.73.29.74]); Fri, 18 Dec 2009 00:07:39 +1030 (CST) Cc: freebsd-current@freebsd.org Subject: Re: 8.0-RELEASE: disk IO temporarily hangs up (ZFS or ATA related problem) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Dec 2009 13:37:41 -0000 On 17/12/09 19:30, Alexander Zagrebin wrote: > Hi! > > I use onboard ICH7 SATA controller with two disks attached: > > atapci1: port > 0x30c8-0x30cf,0x30ec-0x30ef,0x30c0-0x30c7,0x30e8-0x30eb,0x30a0-0x30af irq 19 > at device 31.2 on pci0 > atapci1: [ITHREAD] > ata2: on atapci1 > ata2: [ITHREAD] > ata3: on atapci1 > ata3: [ITHREAD] > ad4: 1430799MB at ata2-master SATA150 > ad6: 1430799MB at ata3-master SATA150 > > The disks are used for mirrored ZFS pool. > I have noticed that the system periodically locks up on disk operations. > After approx. 10 min of very slow disk i/o (several KB/s) the speed of disk > operations restores to normal. > gstat has shown that the problem is in ad6. > For example, there is a filtered output of iostat -x 1: > > extended device statistics > device r/s w/s kr/s kw/s wait svc_t %b > ad6 818.6 0.0 10840.2 0.0 0 0.4 34 > ad6 300.6 642.0 3518.5 24830.3 50 24.8 72 > ad6 1.0 639.3 63.7 17118.3 0 62.1 98 > ad6 404.5 4.0 6837.7 4.0 0 0.5 18 > ad6 504.5 0.0 13667.2 0.0 1 0.7 32 > ad6 633.3 0.0 13190.3 0.0 1 0.7 38 > ad6 416.3 384.5 8134.7 24606.2 0 16.3 57 > ad6 538.9 76.7 9772.8 2982.2 55 2.9 40 > ad6 31.9 929.5 801.0 37498.6 0 27.2 82 > ad6 635.5 0.0 13087.1 0.0 1 0.6 35 > ad6 579.6 0.0 16669.8 0.0 0 0.8 43 > ad6 603.6 0.0 11697.4 0.0 1 0.7 40 > ad6 538.0 0.0 10438.7 0.0 0 0.9 47 > ad6 30.9 898.4 868.6 40585.4 0 36.6 78 > ad6 653.3 86.6 8566.6 202.7 1 0.8 40 > ad6 737.1 0.0 6429.4 0.0 1 0.6 42 > ad6 717.1 0.0 3958.7 0.0 0 0.5 36 > ad6 1179.5 0.0 2058.9 0.0 0 0.1 15 > ad6 1191.2 0.0 1079.6 0.0 1 0.1 15 > ad6 985.1 0.0 5093.9 0.0 0 0.2 23 > ad6 761.8 0.0 9801.3 0.0 1 0.4 31 > ad6 698.7 0.0 9215.1 0.0 0 0.4 30 > ad6 434.2 513.9 5903.1 13658.3 48 10.2 55 > ad6 3.0 762.8 191.2 28732.3 0 57.6 99 > ad6 10.0 4.0 163.9 4.0 1 1.6 2 > > Before this line we have a normal operations. > Then the behaviour of ad6 changes (pay attention to high average access time > and percent of "busy" significantly greater than 100): > > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 1.0 0.0 0.5 0.0 1 1798.3 179 > ad6 1.0 0.0 1.5 0.0 1 1775.4 177 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 10.0 0.0 75.2 0.0 1 180.3 180 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 83.7 0.0 862.9 0.0 1 21.4 179 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 1.0 0.0 63.7 0.0 1 1707.4 170 > ad6 1.0 0.0 9.0 0.0 0 1791.0 178 > ad6 10.9 0.0 172.2 0.0 2 0.2 0 > ad6 24.9 0.0 553.7 0.0 1 143.3 179 > ad6 0.0 0.0 0.0 0.0 7 0.0 0 > ad6 2.0 23.9 32.4 1529.9 1 336.3 177 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 68.7 0.0 1322.8 0.0 1 26.3 181 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 27.9 0.0 193.7 0.0 1 61.6 172 > ad6 1.0 0.0 2.5 0.0 1 1777.4 177 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 1.0 0.0 2.0 0.0 1 1786.9 178 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 2.0 0.0 6.5 0.0 1 899.4 179 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > ad6 1.0 0.0 2.0 0.0 1 1786.7 178 > ad6 0.0 0.0 0.0 0.0 1 0.0 0 > > And so on for about 10 minutes. > Then the disk i/o is reverted to normal: > > ad6 139.4 0.0 8860.5 0.0 1 4.4 61 > ad6 167.3 0.0 10528.7 0.0 1 3.3 55 > ad6 60.8 411.5 3707.6 8574.8 1 19.6 87 > ad6 163.4 0.0 10334.9 0.0 1 4.4 72 > ad6 157.4 0.0 9770.7 0.0 1 5.0 78 > ad6 108.5 0.0 6886.8 0.0 0 3.9 43 > ad6 101.6 0.0 6381.4 0.0 0 2.6 27 > ad6 109.6 0.0 7013.9 0.0 0 2.0 22 > ad6 121.4 0.0 7769.7 0.0 0 2.4 29 > ad6 92.5 0.0 5922.6 0.0 1 3.4 31 > ad6 122.4 19.9 7833.0 1273.7 0 3.9 54 > ad6 83.6 0.0 5349.5 0.0 0 3.9 33 > ad6 5.0 0.0 318.4 0.0 0 8.1 4 > > There are no ata error messages neither in the system log, nor on the > console. > The manufacture's diagnostic test is passed on ad6 without any errors. > The ad6 also contains swap partition. > I have tried to run several (10..20) instances of dd, which read and write > data > from and to the swap partition simultaneously, but it has not called the > lockup. > So there is a probability that this problem is ZFS related. > > I have been forced to switch ad6 to the offline state... :( > > Any suggestions on this problem? > I also have been experiencing the same problem with a different disk/controller (via mpt on a vmware machine). During the same period I notice that system cpu usage hits 80+% and top shows the zfskern process being the main culprit. At the same time I've discovered the kstat.zfs.misc.arcstats.memory_throttle_count sysctl rising. Arc is also normally close to the arc_max limit. Looking at the code, I'm wondering if EAGAIN really should have been ERESTART in the case when a reclaim was needed: contrib/opensolaris/uts/common/fs/zfs/arc.c: line 3314 /* * If we are in pageout, we know that memory is already tight, * the arc is already going to be evicting, so we just want to * continue to let page writes occur as quickly as possible. */ if (curproc == pageproc) { if (page_load > available_memory / 4) return (ERESTART); /* Note: reserve is inflated, so we deflate */ page_load += reserve / 8; return (0); } else if (page_load > 0 && arc_reclaim_needed()) { /* memory is low, delay before restarting */ ARCSTAT_INCR(arcstat_memory_throttle_count, 1); return (EAGAIN); }