Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Apr 2007 12:26:04 -0500
From:      "Rick C. Petty" <rick-freebsd@kiwi-computer.com>
To:        freebsd-geom@FreeBSD.org
Subject:   Re: volume management
Message-ID:  <20070410172604.GA21036@keira.kiwi-computer.com>
In-Reply-To: <20070410162129.GI85578@garage.freebsd.pl>
References:  <20070409152401.GG76673@garage.freebsd.pl> <20070409153203.GA88082@harmless.hu> <461A5EC6.8010000@freebsd.org> <20070409154407.GA88621@harmless.hu> <evfqtt$n23$1@sea.gmane.org> <20070410111957.GA85578@garage.freebsd.pl> <461B75B2.40201@fer.hr> <20070410114115.GB85578@garage.freebsd.pl> <20070410161445.GA18858@keira.kiwi-computer.com> <20070410162129.GI85578@garage.freebsd.pl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 10, 2007 at 06:21:29PM +0200, Pawel Jakub Dawidek wrote:
> 
> The choice you have currently is to panic and lost few last seconds of
> your data, but keep file system in a consistent state, or to return

How can you guarantee the FS is consistent at that point?  Are you looking
through the list of blocks to be written?  Granted, with soft updates this
is less risky, because presumably the metadata blocks haven't been written
until the data blocks are.

> ENOSPC which nobody is going to handle and which may at the end corrupt
> your file system to a state that fsck won't be able to fix it.

Is a file system thread waiting on the block to be written, or because it's
in a write cache is the caller lost forever?  I thought the UFS soft
updates code was blocking on the write, even though the userland caller had
a successful return.  If so, the FS should handle the error and avoid
inconsistencies.

I certainly see this type of behavior in gvinum when a disk is lost and a
write to a slice cannot finish successfully.  I'm very glad the box doesn't
panic as often because I can sometimes go in and bring the drive back up.

> This is not about simple write operation to the disk. Those operations
> are delayed anyway, your userland process will see the write operation
> succeeded. This is about kernel and file system consistency.

I'm aware of that, but what's the call stack leading up to the GEOM
failure?  I was under the impression that UFS was blocked waiting for a
write operation, which is all done in the kernel anyway.

> It will be
> great to just fix everything in the kernel to handle errors properly,
> but good luck with that.

That's a worthy goal and something we should be pursuing.  After all,
FreeBSD used to be noted for its stability.  I wouldn't call panics a sign
of stability..  You're better off invalidating all the geom consumers and
leaving the rest of the system up so an admin can try to recover critical
data, or so the remaining geom providers can continue to function.

-- Rick C. Petty



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070410172604.GA21036>