From owner-freebsd-stable@FreeBSD.ORG Sat Mar 6 13:19:58 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F1E7106566B for ; Sat, 6 Mar 2010 13:19:58 +0000 (UTC) (envelope-from torfing@broadpark.no) Received: from eterpe-smout.broadpark.no (eterpe-smout.broadpark.no [80.202.8.16]) by mx1.freebsd.org (Postfix) with ESMTP id 4A4F18FC12 for ; Sat, 6 Mar 2010 13:19:58 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from ignis-smin.broadpark.no ([unknown] [80.202.8.11]) by eterpe-smout.broadpark.no (Sun Java(tm) System Messaging Server 7u3-12.01 64bit (built Oct 15 2009)) with ESMTP id <0KYV002L83OXFN20@eterpe-smout.broadpark.no> for freebsd-stable@freebsd.org; Sat, 06 Mar 2010 14:19:45 +0100 (CET) Received: from kg-v2.kg4.no ([unknown] [80.203.92.186]) by ignis-smin.broadpark.no (Sun Java(tm) System Messaging Server 7u3-12.01 64bit (built Oct 15 2009)) with SMTP id <0KYV0054H3OWMJD0@ignis-smin.broadpark.no> for freebsd-stable@freebsd.org; Sat, 06 Mar 2010 14:19:45 +0100 (CET) Date: Sat, 06 Mar 2010 14:19:44 +0100 From: Torfinn Ingolfsen To: freebsd-stable@freebsd.org Message-id: <20100306141944.95ec8cb6.torfinn.ingolfsen@broadpark.no> In-reply-to: <20100131144217.ca08e965.torfinn.ingolfsen@broadpark.no> References: <20100131144217.ca08e965.torfinn.ingolfsen@broadpark.no> X-Mailer: Sylpheed 2.7.1 (GTK+ 2.18.7; amd64-portbld-freebsd8.0) X-Face: "t9w2,-X@O^I`jVW\sonI3.,36KBLZE*AL[y9lL[PyFD*r_S:dIL9c[8Y>V42R0"!"yb_zN,f#%.[PYYNq; m"_0v; ~rUM2Yy!zmkh)3&U|u!=T(zyv,MHJv"nDH>OJ`t(@mil461d_B'Uo|'nMwlKe0Mv=kvV?Nh@>Hb<3s_z2jYgZhPb@?Wi^x1a~Hplz1.zH Subject: Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Mar 2010 13:19:58 -0000 Ok, a new development in this story. Note that as of yet, I haven't change SATA cables or done anything else with the hardware. However, I did upgrade to latest FreeBSD 8.0-stable / amd64 yesterday. The machine is still up (it iahsn't crashed yet), and today I found this in /var/log/messages: Mar 6 06:25:34 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 0000007f Mar 6 06:25:34 kg-f2 kernel: ata5: hardware reset timeout Mar 6 06:25:45 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 0000007f Mar 6 06:25:45 kg-f2 kernel: ata6: hardware reset timeout Mar 6 06:25:45 kg-f2 root: ZFS: vdev failure, zpool=storage type=vdev.no_replicas Mar 6 06:25:56 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 00000080 Mar 6 06:25:56 kg-f2 kernel: ata5: hardware reset timeout Mar 6 06:26:06 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 0000007f Mar 6 06:26:06 kg-f2 kernel: ata6: hardware reset timeout Mar 6 06:26:08 kg-f2 root: ZFS: zpool I/O failure, zpool=storage error=28 Mar 6 06:26:08 kg-f2 last message repeated 2 times Mar 6 06:26:08 kg-f2 root: ZFS: vdev I/O failure, zpool=storage path= offset= size= error= Mar 6 06:26:16 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 0000007f Mar 6 06:26:16 kg-f2 kernel: ata5: hardware reset timeout Mar 6 06:26:27 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 0000007f Mar 6 06:26:27 kg-f2 kernel: ata6: hardware reset timeout Mar 6 06:26:37 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 00000080 Mar 6 06:26:37 kg-f2 kernel: ata5: hardware reset timeout Mar 6 06:26:47 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 0000007f Mar 6 06:26:47 kg-f2 kernel: ata6: hardware reset timeout Mar 6 06:26:58 kg-f2 kernel: ata5: port is not ready (timeout 10000ms) tfd = 0000007f Mar 6 06:26:58 kg-f2 kernel: ata5: hardware reset timeout Mar 6 06:27:08 kg-f2 kernel: ata6: port is not ready (timeout 10000ms) tfd = 00000080 Mar 6 06:27:08 kg-f2 kernel: ata6: hardware reset timeout Before the upgrade, messages such as these would (AFAICT) nresult on a panic and reboot. Uptime: root@kg-f2# uptime 2:11PM up 19:38, 3 users, load averages: 0.00, 0.00, 0.00 The boot / root mirror pool is okay: root@kg-f2# zpool status zroot pool: zroot state: ONLINE scrub: scrub completed after 0h8m with 0 errors on Fri Mar 5 18:45:24 2010 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror ONLINE 0 0 0 gpt/disk0 ONLINE 0 0 0 gpt/disk1 ONLINE 0 0 0 errors: No known data errors However, the storage pool is not: root@kg-f2# zpool status storage pool: storage state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://www.sun.com/msg/ZFS-8000-HC scrub: scrub completed after 0h0m with 0 errors on Fri Mar 5 18:36:17 2010 config: NAME STATE READ WRITE CKSUM storage UNAVAIL 0 3 0 insufficient replicas raidz1 UNAVAIL 0 0 0 insufficient replicas ad8 ONLINE 0 0 0 ad10 REMOVED 0 0 0 ad12 REMOVED 0 0 0 ad14 ONLINE 0 0 0 ada0 ONLINE 0 0 0 errors: 2 data errors, use '-v' for a list Currently, this pool isn't in use, so I am not concerned about data loss (luckily). Note that before this upgrade, with all panics and reboots, both zfs pools have always been clean and trouble-free after a reboot. atacontrol confirms that ad10 and ad12 are "gone" (ie. disconnected: root@kg-f2# atacontrol list ATA channel 0: Master: no device present Slave: no device present ATA channel 2: Master: ad4 SATA revision 2.x Slave: no device present ATA channel 3: Master: ad6 SATA revision 2.x Slave: no device present ATA channel 4: Master: ad8 SATA revision 2.x Slave: no device present ATA channel 5: Master: no device present Slave: no device present ATA channel 6: Master: no device present Slave: no device present ATA channel 7: Master: ad14 SATA revision 2.x Slave: no device present What happens if I just rebot the server now? (I think that ad10 and ad12 will be detected and connected), but what will zfs do with the 'storage' pool? As always, more info (including verbose dmesgs etc.) on the FreeBSD page[1] for this machine. References: 1) FreeBSd on this machine: http://sites.google.com/site/tingox/ga-ma74gm-s2h_freebsd -- Torfinn