From owner-freebsd-current@FreeBSD.ORG Mon Oct 18 07:14:38 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B1E6116A4CE; Mon, 18 Oct 2004 07:14:38 +0000 (GMT) Received: from av1-1-sn4.m-sp.skanova.net (av1-1-sn4.m-sp.skanova.net [81.228.10.116]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4022E43D1D; Mon, 18 Oct 2004 07:14:38 +0000 (GMT) (envelope-from daniel_k_eriksson@telia.com) Received: by av1-1-sn4.m-sp.skanova.net (Postfix, from userid 502) id 7496638441; Mon, 18 Oct 2004 09:14:37 +0200 (CEST) Received: from smtp2-1-sn4.m-sp.skanova.net (smtp2-1-sn4.m-sp.skanova.net [81.228.10.183]) by av1-1-sn4.m-sp.skanova.net (Postfix) with ESMTP id 6721D37E6B; Mon, 18 Oct 2004 09:14:37 +0200 (CEST) Received: from sentinel (h130n1fls11o822.telia.com [213.64.66.130]) by smtp2-1-sn4.m-sp.skanova.net (Postfix) with ESMTP id 456A937E56; Mon, 18 Oct 2004 09:14:37 +0200 (CEST) From: "Daniel Eriksson" To: "'Pawel Jakub Dawidek'" Date: Mon, 18 Oct 2004 09:14:21 +0200 Organization: Home Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.6353 In-Reply-To: <20041018055448.GE73767@darkness.comp.waw.pl> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Thread-Index: AcS013XE7SmD/jD1QPCCvKGUDcs4YwACAGMg cc: freebsd-current@freebsd.org Subject: RE: Current crash on today's kernel X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Oct 2004 07:14:38 -0000 Pawel Jakub Dawidek wrote: >> The machine is using both ataraid and gvinum. (But cannot use geom_stripe >> since it doesn't want to play nice with ataraid.) > > What are the problems exactly? I'm happy to help. I'm not sure if it is limited to my particular setup or not. Unfortunately I don't have enough hardware to test it on anything other than a production machine, so that limits my willingness to try things. I had a nice log of this with extra geom debugging enabled, but it seems I have misplaced it. I don't think there was anything that stood out in the log though. When geom_stripe starts to taste providers it messes up the ataraid arrays, making the discs in the arrays time out. This of course results in the arrays being marked as broken. And because of bugs somewhere else, having a disc/array disappear from under a live filesystem usually results in a system panic. Again, if this is a local problem it will be hard to debug given that the machine needs to be up. However, verifying that it is a local or a general problem is easy: Just hook two or more discs up as an ataraid RAID0 array and then try to create a geom_stripe array using some other discs. If it generates a bunch of ATA timeouts which eventually tears down the ataraid array then it's a general problem. I also remember waiting for all the ataraid arrays to fail (I have 4 in the machine, takes 15-30 sec for all of them to fail). Once they had all failed I tried to access the newly created geom_stripe array, and it worked just fine. I then ran 'atacontrol delete' to remove one of the failed arrays and tried to rebuild it with 'atacontrol create'. As soon as the arX device was created, geom wanted to taste it which again generated ATA timeouts which tore the array down. Sorry about the lack of details. /Daniel Eriksson