From owner-freebsd-stable@FreeBSD.ORG Tue Aug 11 21:13:17 2009 Return-Path: Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 588A6106566B for ; Tue, 11 Aug 2009 21:13:17 +0000 (UTC) (envelope-from news@citylink.dinoex.sub.org) Received: from uucp.dinoex.sub.de (uucp.dinoex.sub.de [194.45.71.2]) by mx1.freebsd.org (Postfix) with ESMTP id CCC1E8FC1F for ; Tue, 11 Aug 2009 21:13:16 +0000 (UTC) Received: from uucp.dinoex.sub.de (uucp@uucp.dinoex.sub.de [194.45.71.2] (may be forged)) by uucp.dinoex.sub.de (8.14.3/8.14.2) with ESMTP id n7BLCeSY070601 for ; Tue, 11 Aug 2009 23:12:41 +0200 (CEST) (envelope-from news@citylink.dinoex.sub.org) X-MDaemon-Deliver-To: Received: from citylink.dinoex.sub.org (uucp@localhost) by uucp.dinoex.sub.de (8.14.3/8.14.2/Submit) with UUCP id n7BLCeL5070600 for freebsd-stable@FreeBSD.ORG; Tue, 11 Aug 2009 23:12:40 +0200 (CEST) (envelope-from news@citylink.dinoex.sub.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by citylink.dinoex.sub.de (8.14.3/8.14.2) with ESMTP id n7BL4eOo030249 for ; Tue, 11 Aug 2009 23:04:40 +0200 (CEST) (envelope-from news@gate.oper.dinoex.org) Received: from gate.oper.dinoex.org (gate-e [192.168.98.2]) by gate.oper.dinoex.org (8.14.3/8.14.3) with ESMTP id n7BL44vG030155 for ; Tue, 11 Aug 2009 23:04:05 +0200 (CEST) (envelope-from news@gate.oper.dinoex.org) Received: (from news@localhost) by gate.oper.dinoex.org (8.14.3/8.14.3/Submit) id n7BL446o030154 for freebsd-stable@FreeBSD.ORG; Tue, 11 Aug 2009 23:04:04 +0200 (CEST) (envelope-from news) From: pmc@citylink.dinoex.sub.org (Peter Much) Message-ID: Date: Tue, 11 Aug 2009 20:56:46 GMT Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=ISO-8859-1 References: <4A325E9F.2080802@icyb.net.ua> <3c1674c90906121354s6d6ae7ben5082708b1586e94f@mail.gmail.com> Mime-Version: 1.0 Organization: some more stinking socks X-Newsreader: trn 4.0-test76 (Apr 2, 2001) Sender: To: freebsd-stable@FreeBSD.ORG X-Milter: Spamilter (Reciever: uucp.dinoex.sub.de; Sender-ip: 194.45.71.2; Sender-helo: uucp.dinoex.sub.de; ) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (uucp.dinoex.sub.de [194.45.71.2]); Tue, 11 Aug 2009 23:12:41 +0200 (CEST) Cc: Subject: Re: zfs/panic: short after rollback X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Aug 2009 21:13:17 -0000 aka Kip Macy schrieb mit Datum Fri, 12 Jun 2009 13:54:40 -0700 in m2n.fbsd.stable: |show sleepchain |show thread 100263 | |On Fri, Jun 12, 2009 at 6:56 AM, Andriy Gapon wrote: |> |> I did zfs rollback xxx@yyy |> And then did ls on a directory in the rolled-back fs. |> panic: sleeping thread This is quite likely the same problem as I experience. And it is maybe also the same problem as in kern/137037 and kern/129148. It seems to show up in some different flavours, while the bottomline is this: do a rollback, and soon after (usually at the next filesystem-related action) the kernel has gone fishing. I experienced it first when doing a rollback of a mounted filesystem. It crashed right after the first try, and it did so reproducible. (Well, more or less reproducible - another day under similar circumstances it did not crash.) Then I started thinking, and came to the conclusion that a rollback of a mounted filesystem (with possibly open files) could easily bring a lot of things into an undefined state, and should not be something one wants to do normally. So maybe it is not supposed to work at all. Anyway, when trying this, I do either get the "sleeping thread" message (as above), or a panic from _sx_xlock() (as shown in my addendum to kern/137037, and in the addendum to kern/129148). So I started to do rollbacks on unmounted filesystems (quite an excessive amount of them), and while this seemed to work at first, later on the system failures reappeared. These system failures took various shapes - I experienced immediate resets without dump, and system hangs. When deliberately trying to reproduce that (after installing a kernel with debugging info and watching the console), I also captured a panic coming from _sx_xlock() - so it seems to be the same problem as without unmounting, only that it takes a couple of rollbacks (a dozen or more) to hit. Over all, there was never any data loss or persistent damage. So, I consider rollback still functional and safe to use, but I consider a system no longer production stable after doing a rollback. rgds, PMc