Date: Tue, 10 Apr 2007 12:42:29 -0500 From: Eric Anderson <anderson@freebsd.org> To: rick-freebsd@kiwi-computer.com Cc: freebsd-geom@freebsd.org Subject: Re: volume management Message-ID: <461BCC85.2080900@freebsd.org> In-Reply-To: <20070410172604.GA21036@keira.kiwi-computer.com> References: <20070409152401.GG76673@garage.freebsd.pl> <20070409153203.GA88082@harmless.hu> <461A5EC6.8010000@freebsd.org> <20070409154407.GA88621@harmless.hu> <evfqtt$n23$1@sea.gmane.org> <20070410111957.GA85578@garage.freebsd.pl> <461B75B2.40201@fer.hr> <20070410114115.GB85578@garage.freebsd.pl> <20070410161445.GA18858@keira.kiwi-computer.com> <20070410162129.GI85578@garage.freebsd.pl> <20070410172604.GA21036@keira.kiwi-computer.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 04/10/07 12:26, Rick C. Petty wrote: > On Tue, Apr 10, 2007 at 06:21:29PM +0200, Pawel Jakub Dawidek wrote: >> The choice you have currently is to panic and lost few last seconds of >> your data, but keep file system in a consistent state, or to return > > How can you guarantee the FS is consistent at that point? Are you looking > through the list of blocks to be written? Granted, with soft updates this > is less risky, because presumably the metadata blocks haven't been written > until the data blocks are. > >> ENOSPC which nobody is going to handle and which may at the end corrupt >> your file system to a state that fsck won't be able to fix it. > > Is a file system thread waiting on the block to be written, or because it's > in a write cache is the caller lost forever? I thought the UFS soft > updates code was blocking on the write, even though the userland caller had > a successful return. If so, the FS should handle the error and avoid > inconsistencies. > > I certainly see this type of behavior in gvinum when a disk is lost and a > write to a slice cannot finish successfully. I'm very glad the box doesn't > panic as often because I can sometimes go in and bring the drive back up. > >> This is not about simple write operation to the disk. Those operations >> are delayed anyway, your userland process will see the write operation >> succeeded. This is about kernel and file system consistency. > > I'm aware of that, but what's the call stack leading up to the GEOM > failure? I was under the impression that UFS was blocked waiting for a > write operation, which is all done in the kernel anyway. I think the issue is that UFS doesn't expect to see ENOSPC from the storage, since it believes it's on a provider that should be big enough. Is the right thing to teach UFS to recognize ENOSPC, and pass that on to the userland? >> It will be >> great to just fix everything in the kernel to handle errors properly, >> but good luck with that. > > That's a worthy goal and something we should be pursuing. After all, > FreeBSD used to be noted for its stability. I wouldn't call panics a sign > of stability.. You're better off invalidating all the geom consumers and > leaving the rest of the system up so an admin can try to recover critical > data, or so the remaining geom providers can continue to function. There's been talk in the past about making the mount read-only instead of a panic in some situations, but I know nothing more than that. Eric
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?461BCC85.2080900>