From owner-freebsd-current@FreeBSD.ORG Fri Dec 18 09:11:55 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7102A106568D for ; Fri, 18 Dec 2009 09:11:55 +0000 (UTC) (envelope-from pmurray@nevada.net.nz) Received: from bellagio.open2view.net (bellagio.open2view.net [210.48.79.75]) by mx1.freebsd.org (Postfix) with ESMTP id E5AD88FC1C for ; Fri, 18 Dec 2009 09:11:54 +0000 (UTC) Received: from [10.1.1.4] (ip-118-90-27-24.xdsl.xnet.co.nz [118.90.27.24]) (Authenticated sender: pmurray@nevada.net.nz) by bellagio.open2view.net (Postfix) with ESMTP id 06DA71102BE1; Fri, 18 Dec 2009 22:11:52 +1300 (NZDT) Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: text/plain; charset=us-ascii From: Phil Murray In-Reply-To: <4C1C2598-4157-4B04-8DB8-C84F353AB8B8@nevada.net.nz> Date: Fri, 18 Dec 2009 22:11:52 +1300 Content-Transfer-Encoding: quoted-printable Message-Id: <6FAA390A-1E40-4D7A-AAD5-DC72578CE974@nevada.net.nz> References: <39309F560B98453EBB9AEA0F29D9D80E@vosz.local> <4B2A341C.5000802@clearchain.com> <6D3B0162A2134CAEA9F4DF5BC03707AA@vosz.local> <4C1C2598-4157-4B04-8DB8-C84F353AB8B8@nevada.net.nz> To: Phil Murray X-Mailer: Apple Mail (2.1077) Cc: "freebsd-current@freebsd.org" , Alexander Zagrebin Subject: Re: 8.0-RELEASE: disk IO temporarily hangs up (ZFS or ATA related problem) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2009 09:11:55 -0000 On 18/12/2009, at 9:39 PM, Phil Murray wrote: >=20 >=20 > On 18/12/2009, at 9:15 PM, "Alexander Zagrebin" wrote: >=20 >> Big thanks for your reply! >>=20 >>>> I use onboard ICH7 SATA controller with two disks attached: >>>>=20 >>>> atapci1: port >>>>=20 >>> 0x30c8-0x30cf,0x30ec-0x30ef,0x30c0-0x30c7,0x30e8-0x30eb,0x30a0 >>> -0x30af irq 19 >>>> at device 31.2 on pci0 >>>> atapci1: [ITHREAD] >>>> ata2: on atapci1 >>>> ata2: [ITHREAD] >>>> ata3: on atapci1 >>>> ata3: [ITHREAD] >>>> ad4: 1430799MB at ata2-master SATA150 >>>> ad6: 1430799MB at ata3-master = SATA150 >>>>=20 >>>> The disks are used for mirrored ZFS pool. >>>> I have noticed that the system periodically locks up on >>> disk operations. >>>> After approx. 10 min of very slow disk i/o (several KB/s) >>> the speed of disk >>>> operations restores to normal. >>>> gstat has shown that the problem is in ad6. >>>> For example, there is a filtered output of iostat -x 1: >>>>=20 >>>> extended device statistics >>>> device r/s w/s kr/s kw/s wait svc_t %b >>>> ad6 985.1 0.0 5093.9 0.0 0 0.2 23 >>>> ad6 761.8 0.0 9801.3 0.0 1 0.4 31 >>>> ad6 698.7 0.0 9215.1 0.0 0 0.4 30 >>>> ad6 434.2 513.9 5903.1 13658.3 48 10.2 55 >>>> ad6 3.0 762.8 191.2 28732.3 0 57.6 99 >>>> ad6 10.0 4.0 163.9 4.0 1 1.6 2 >>>>=20 >>>> Before this line we have a normal operations. >>>> Then the behaviour of ad6 changes (pay attention to high >>> average access time >>>> and percent of "busy" significantly greater than 100): >>>>=20 >>>> ad6 0.0 0.0 0.0 0.0 1 0.0 0 >>>> ad6 1.0 0.0 0.5 0.0 1 1798.3 179 >>>> ad6 1.0 0.0 1.5 0.0 1 1775.4 177 >>>> ad6 0.0 0.0 0.0 0.0 1 0.0 0 >>>> ad6 10.0 0.0 75.2 0.0 1 180.3 180 >>>> ad6 0.0 0.0 0.0 0.0 1 0.0 0 >>>> ad6 1.0 0.0 2.0 0.0 1 1786.7 178 >>>> ad6 0.0 0.0 0.0 0.0 1 0.0 0 >>>>=20 >>>> And so on for about 10 minutes. >>>> Then the disk i/o is reverted to normal: >>>>=20 >>>> ad6 139.4 0.0 8860.5 0.0 1 4.4 61 >>>> ad6 167.3 0.0 10528.7 0.0 1 3.3 55 >>>> ad6 60.8 411.5 3707.6 8574.8 1 19.6 87 >>>> ad6 163.4 0.0 10334.9 0.0 1 4.4 72 >>>> ad6 157.4 0.0 9770.7 0.0 1 5.0 78 >>>> ad6 108.5 0.0 6886.8 0.0 0 3.9 43 >>>>=20 >>>> There are no ata error messages neither in the system log, >>> nor on the >>>> console. >>>> The manufacture's diagnostic test is passed on ad6 without >>> any errors. >>>> The ad6 also contains swap partition. >>>> I have tried to run several (10..20) instances of dd, which >>> read and write >>>> data >>>> from and to the swap partition simultaneously, but it has >>> not called the >>>> lockup. >>>> So there is a probability that this problem is ZFS related. >>>>=20 >>>> I have been forced to switch ad6 to the offline state... :( >>>>=20 >>>> Any suggestions on this problem? >>>>=20 >>> I also have been experiencing the same problem with a different >>> disk/controller (via mpt on a vmware machine). During the >>> same period I >>> notice that system cpu usage hits 80+% and top shows the >>> zfskern process >>> being the main culprit. At the same time I've discovered the >>> kstat.zfs.misc.arcstats.memory_throttle_count sysctl rising. >>> Arc is also >>> normally close to the arc_max limit. >>=20 >> My case has differences. >> 1. CPU usage is near 0% >> 2. zfs's sysctls doesn't change significantly during >> "normal operation" -> "lockup" -> "normal" transition >> 3. ARC size is far from its limits, >> kstat.zfs.misc.arcstats.memory_throttle_count: 0 >>=20 >> Here my actions, observations and conclusions: >> 1. I have tried to change placements of disks on sata channels. >> Nothing has changed - the problems still on WD15EADS, although it = became >> ad4. >> So issue isn't in south bridge, sata cables and so on. >> 2. I have tried to detach ad6 from the pool, to zero system area, and = to >> reattach it again. >> Of course, resilvering was started. During resilvering 250 GB was = copied >> without lockups >> and delays. While resilvering, I have tried periodically to load = drive >> with a read >> operations (dd if=3D/dev/ad6 of=3D/dev/null ...). >> But after resilvering and several minutes of normal mirror = operation, >> lockups appeared again. >> So drive is seems to be ok and we have a software problem? >> 3. I have noticed that lockups often happens during postgresql = activity. >> postgresql often uses sync. So I have tried to disable ZIL. >> No success. >> 4. "IDE LED" is constantly on during lockups. >> So it is really read/write delays. >> 5. I see two variants of zfskern's state: >> a) it is constantly in the vgeom:io >> b) it is in either zio->io_ state (when active), or in tx->tx_s = (when >> idle). >> During lockups it is mostly in zio->io_. >> What the difference with vgeom:io and zio->io_/tx->tx_s? >>=20 >> May be a problem is in ata? WD15EADS is a "green" series of drives. >=20 > The WD green drives have a feature called Time Limited Error Recovery = where the disk can spend several minutes trying to read a bad block etc. >=20 > It plays havoc with RAID arrays which is why WD recommend you don't = use the green drives in arrays. They have more info about the "feature" = in the WD FAQ/knowledgebase >=20 Sorry, TLER is the feature that 'fixes' the problem, see: = http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=3D= 1397&p_created=3D1131638613&p_sid=3DvfyE1KPj&p_accessibility=3D0&p_redirec= t=3D&p_srch=3D1&p_lva=3D&p_sp=3DcF9zcmNoPTEmcF9zb3J0X2J5PSZwX2dyaWRzb3J0PS= ZwX3Jvd19jbnQ9MTcsMTcmcF9wcm9kcz0yMjcsMjk0JnBfY2F0cz0mcF9wdj0yLjI5NCZwX2N2= PSZwX3BhZ2U9MSZwX3NlYXJjaF90ZXh0PXJhaWQ!&p_li=3D&p_topview=3D1 Sounds like your drive is going into the recovery procedure... >=20 >> May be i have a problem with its power management? >> Is there a method to completely reset sata channel and drive? >> atacontrol reinit will do it? >=20 >=20 >=20 >>=20 >> Any help is welcomed. >>=20 >> --=20 >> Alexander Zagrebin >>=20 >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org"