From owner-freebsd-bugs@FreeBSD.ORG  Wed Nov 29 23:30:22 2006
Return-Path: <owner-freebsd-bugs@FreeBSD.ORG>
X-Original-To: freebsd-bugs@hub.freebsd.org
Delivered-To: freebsd-bugs@hub.freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 03A0116A40F
	for <freebsd-bugs@hub.freebsd.org>;
	Wed, 29 Nov 2006 23:30:22 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 6C4A143CA8
	for <freebsd-bugs@hub.freebsd.org>;
	Wed, 29 Nov 2006 23:30:17 +0000 (GMT)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1])
	by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id kATNULqU085361
	for <freebsd-bugs@freefall.freebsd.org>; Wed, 29 Nov 2006 23:30:21 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.13.4/8.13.4/Submit) id kATNUL65085357;
	Wed, 29 Nov 2006 23:30:21 GMT (envelope-from gnats)
Date: Wed, 29 Nov 2006 23:30:21 GMT
Message-Id: <200611292330.kATNUL65085357@freefall.freebsd.org>
To: freebsd-bugs@FreeBSD.org
From: mjacob@freebsd.org
Cc: 
Subject: Re: kern/106030: panic while rebooting with a dead disk
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: mjacob@freebsd.org
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs>
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
	<mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Nov 2006 23:30:22 -0000

The following reply was made to PR kern/106030; it has been noted by GNATS.

From: mjacob@freebsd.org
To: Robert Watson <rwatson@freebsd.org>
Cc: bug-follouwp@freebsd.org
Subject: Re: kern/106030: panic while rebooting with a dead disk
Date: Wed, 29 Nov 2006 15:08:54 -0800 (PST)

 > This is a panic on shutdown in the file system.  All user processes have 
 > exited, and UFS is unable to sync cached data to disk, so there is no way to 
 > report the error to a user process.
 
 Yes- but it is also true that this would happen at a time other than 
 reboot. In fact, I rebooted rather than try and run with a dead disk 
 mounted and much to my annoyance I *still* couldn't avoid a panic. My 
 only other choice would have been to do a 'reboot -n'. Bad in either 
 case.
 
 >
 > There are certainly situations where FreeBSD panics rather than tolerating 
 > invalid file system data, but I believe those problems are entirely at the 
 > file system layer.  There is a kernel printf from GEOM, but the panic occurs 
 > in the buffer cache code, presumably when UFS discovers life sucks more than 
 > it thought.  I'd like to see UFS grow more tolerant of this sort of thing, 
 > and simply lose the data rather than panicking.
 
 Yes.
 
 > That said, I think the more pressing issue is actually with FAT, since 
 > reliable server configurations frequently run UFS over RAID, but most FAT 
 > devices are not only not reliable, but also removeable, which we currently 
 > fail to tolerate at all when the FAT file system is mounted.  A practice run 
 > on tolerating device removal for FAT would probably prepare us to address the 
 > UFS issues more competently, as well as shake out issues in VM, etc, that 
 > might arise.  For example, I believe we currently fail rather poorly when 
 > paging in data from a failing swap device.  Certainly there's no good way to 
 > get out of the situation, but I think we perform one of the less good bad 
 > ways.
 
 Uhh- this conversation just took a rather bizaare twist. It's not just a 
 question of making UFS more fault tolerant- UFS is sort of a dead horse 
 by now and RAID may not help when it's a channel failure (e.g., fibre 
 channel or iSCSI). I'd rather see efforts put into ZFS (and fixing the 
 XFS port to actually work)- but that is besides the point. It's more of 
 a case to make sure that we don't panic when we don't have to. Now we do 
 too much.
 
 But these are very good points- thanks for the review of my somewhat 
 botched bug report.