From owner-freebsd-bugs@FreeBSD.ORG Wed Nov 29 23:30:22 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 03A0116A40F for ; Wed, 29 Nov 2006 23:30:22 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6C4A143CA8 for ; Wed, 29 Nov 2006 23:30:17 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id kATNULqU085361 for ; Wed, 29 Nov 2006 23:30:21 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id kATNUL65085357; Wed, 29 Nov 2006 23:30:21 GMT (envelope-from gnats) Date: Wed, 29 Nov 2006 23:30:21 GMT Message-Id: <200611292330.kATNUL65085357@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: mjacob@freebsd.org Cc: Subject: Re: kern/106030: panic while rebooting with a dead disk X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: mjacob@freebsd.org List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Nov 2006 23:30:22 -0000 The following reply was made to PR kern/106030; it has been noted by GNATS. From: mjacob@freebsd.org To: Robert Watson Cc: bug-follouwp@freebsd.org Subject: Re: kern/106030: panic while rebooting with a dead disk Date: Wed, 29 Nov 2006 15:08:54 -0800 (PST) > This is a panic on shutdown in the file system. All user processes have > exited, and UFS is unable to sync cached data to disk, so there is no way to > report the error to a user process. Yes- but it is also true that this would happen at a time other than reboot. In fact, I rebooted rather than try and run with a dead disk mounted and much to my annoyance I *still* couldn't avoid a panic. My only other choice would have been to do a 'reboot -n'. Bad in either case. > > There are certainly situations where FreeBSD panics rather than tolerating > invalid file system data, but I believe those problems are entirely at the > file system layer. There is a kernel printf from GEOM, but the panic occurs > in the buffer cache code, presumably when UFS discovers life sucks more than > it thought. I'd like to see UFS grow more tolerant of this sort of thing, > and simply lose the data rather than panicking. Yes. > That said, I think the more pressing issue is actually with FAT, since > reliable server configurations frequently run UFS over RAID, but most FAT > devices are not only not reliable, but also removeable, which we currently > fail to tolerate at all when the FAT file system is mounted. A practice run > on tolerating device removal for FAT would probably prepare us to address the > UFS issues more competently, as well as shake out issues in VM, etc, that > might arise. For example, I believe we currently fail rather poorly when > paging in data from a failing swap device. Certainly there's no good way to > get out of the situation, but I think we perform one of the less good bad > ways. Uhh- this conversation just took a rather bizaare twist. It's not just a question of making UFS more fault tolerant- UFS is sort of a dead horse by now and RAID may not help when it's a channel failure (e.g., fibre channel or iSCSI). I'd rather see efforts put into ZFS (and fixing the XFS port to actually work)- but that is besides the point. It's more of a case to make sure that we don't panic when we don't have to. Now we do too much. But these are very good points- thanks for the review of my somewhat botched bug report.