Date: Wed, 29 Nov 2006 23:30:21 GMT From: mjacob@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: Re: kern/106030: panic while rebooting with a dead disk Message-ID: <200611292330.kATNUL65085357@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/106030; it has been noted by GNATS. From: mjacob@freebsd.org To: Robert Watson <rwatson@freebsd.org> Cc: bug-follouwp@freebsd.org Subject: Re: kern/106030: panic while rebooting with a dead disk Date: Wed, 29 Nov 2006 15:08:54 -0800 (PST) > This is a panic on shutdown in the file system. All user processes have > exited, and UFS is unable to sync cached data to disk, so there is no way to > report the error to a user process. Yes- but it is also true that this would happen at a time other than reboot. In fact, I rebooted rather than try and run with a dead disk mounted and much to my annoyance I *still* couldn't avoid a panic. My only other choice would have been to do a 'reboot -n'. Bad in either case. > > There are certainly situations where FreeBSD panics rather than tolerating > invalid file system data, but I believe those problems are entirely at the > file system layer. There is a kernel printf from GEOM, but the panic occurs > in the buffer cache code, presumably when UFS discovers life sucks more than > it thought. I'd like to see UFS grow more tolerant of this sort of thing, > and simply lose the data rather than panicking. Yes. > That said, I think the more pressing issue is actually with FAT, since > reliable server configurations frequently run UFS over RAID, but most FAT > devices are not only not reliable, but also removeable, which we currently > fail to tolerate at all when the FAT file system is mounted. A practice run > on tolerating device removal for FAT would probably prepare us to address the > UFS issues more competently, as well as shake out issues in VM, etc, that > might arise. For example, I believe we currently fail rather poorly when > paging in data from a failing swap device. Certainly there's no good way to > get out of the situation, but I think we perform one of the less good bad > ways. Uhh- this conversation just took a rather bizaare twist. It's not just a question of making UFS more fault tolerant- UFS is sort of a dead horse by now and RAID may not help when it's a channel failure (e.g., fibre channel or iSCSI). I'd rather see efforts put into ZFS (and fixing the XFS port to actually work)- but that is besides the point. It's more of a case to make sure that we don't panic when we don't have to. Now we do too much. But these are very good points- thanks for the review of my somewhat botched bug report.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200611292330.kATNUL65085357>