From owner-freebsd-fs@FreeBSD.ORG Tue Apr 9 16:19:31 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id C6436E35 for ; Tue, 9 Apr 2013 16:19:31 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from relay02.pair.com (relay02.pair.com [209.68.5.16]) by mx1.freebsd.org (Postfix) with SMTP id 68BFE694 for ; Tue, 9 Apr 2013 16:19:31 +0000 (UTC) Received: (qmail 84655 invoked by uid 0); 9 Apr 2013 16:19:29 -0000 Received: from 173.48.104.62 (HELO ?10.2.2.1?) (173.48.104.62) by relay02.pair.com with SMTP; 9 Apr 2013 16:19:29 -0000 X-pair-Authenticated: 173.48.104.62 Message-ID: <51643F91.30704@sneakertech.com> Date: Tue, 09 Apr 2013 12:19:29 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Tom Evans Subject: Re: ZFS: Failed pool causes system to hang References: <2092374421.4491514.1365459764269.JavaMail.root@k-state.edu> <5163F03B.9060700@sneakertech.com> <51640BDB.1020403@sneakertech.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Apr 2013 16:19:31 -0000 > Sorry, but you've not tested this. Your root is hanging off a > different controller to the others, but it is still using the same > ahci/cam stack. Is ahci/cam getting wedged, causing your root to get > wedged - irrespective of running on a different controller - or is ZFS > causing a deadlock. If I simulate failures by yanking the sata cable to various drives in the pool, I can disconnect any two (raidz2) at random and everything hums along just fine. Status tells me the pool is degraded and if I reconnect them I can resilver and whatnot with no problems. However if I have three drives yanked simultaneously is when everything goes to shit. I don't know the ahci/cam stack from a hole in the wall, but it seems to me that if it can gracefully handle two drives dropping out and coming back randomly, it ought to be able to handle three. I suppose it's possible that zfs itself is not the root cause of the problem, but one way or another there's some kind of interaction here, as I only experience an issue when the pool is no longer solvent. ______________________________________ it has a certain smooth-brained appeal