Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 May 2012 18:45:19 +0200
From:      Mateusz Guzik <mjg@semihalf.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-projects@freebsd.org, Grzegorz Bernacki <gber@freebsd.org>, src-committers@freebsd.org
Subject:   Re: svn commit: r233072 - projects/nand/sys/kern
Message-ID:  <20120510164519.GA13258@pcbsd-2342.semihalf.com>
In-Reply-To: <20120510115857.GH2358@deviant.kiev.zoral.com.ua>
References:  <201203170318.q2H3ITdI047893@svn.freebsd.org> <20120317085116.GC1340@garage.freebsd.pl> <20120317161050.GI75778@deviant.kiev.zoral.com.ua> <4FA8FFB9.7090002@freebsd.org> <20120508095631.GV2358@deviant.kiev.zoral.com.ua> <4FA94609.3060306@freebsd.org> <20120510103105.GG2358@deviant.kiev.zoral.com.ua> <4FABC64F.3060502@freebsd.org> <20120510115857.GH2358@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 10, 2012 at 02:58:57PM +0300, Konstantin Belousov wrote:
> On Thu, May 10, 2012 at 03:44:47PM +0200, Grzegorz Bernacki wrote:
> > On 05/10/12 12:31, Konstantin Belousov wrote:
> > >On Tue, May 08, 2012 at 06:12:57PM +0200, Grzegorz Bernacki wrote:
> > >>On 05/08/12 11:56, Konstantin Belousov wrote:
> > >>>On Tue, May 08, 2012 at 01:12:57PM +0200, Grzegorz Bernacki wrote:
> > >>>>On 03/17/12 17:10, Konstantin Belousov wrote:
> > >>>>>On Sat, Mar 17, 2012 at 09:51:16AM +0100, Pawel Jakub Dawidek wrote:
> > >>>>>>On Sat, Mar 17, 2012 at 03:18:29AM +0000, Grzegorz Bernacki wrote:
> > >>>>>>>Author: gber
> > >>>>>>>Date: Sat Mar 17 03:18:28 2012
> > >>>>>>>New Revision: 233072
> > >>>>>>>URL: http://svn.freebsd.org/changeset/base/233072
> > >>>>>>>
> > >>>>>>>Log:
> > >>>>>>>   Add VFS changes necessary for NANDFS to work.
> > >>>>>>>
> > >>>>>>>   Ignore B_MANAGED buffer by syncer and ignore signal when msleep as
> > >>>>>>>   it
> > >>>>>>>   can cause file system inconsistency.
> > >>>>>>
> > >>>>>>I'd suggest running these changes through kib@. Especially
> > >>>>>>vn_start_write()
> > >>>>>>change below looks ugly, but maybe it is only temporary?
> > >>>>>It is not only ugly (and  object against it).
> > >>>>>
> > >>>>>If the change makes any difference for the filesystem, then I just 
> > >>>>>argue
> > >>>>>that the filesystem is broken. The vn_start_write() is done on the
> > >>>>>VFS entry peripheral, long before filesystem code is hit.
> > >>>>>
> > >>>>>I did not looked at the managed changes, you would need to describe
> > >>>>>what is wrong with current code and what is the purpose of the changes.
> > >>>>>B_MANAGED came from xfs, it seems, or at least xfs is the only current
> > >>>>>consumer of B_MANAGED buffers.
> > >>>>
> > >>>>Hi Kostik,
> > >>>>
> > >>>>Without our change in getblk() whenewer we allocate new block we get
> > >>>>panic:
> > >>>>
> > >>>>panic: bremfree: buffer 0xffffff807bf86080 not on a queue.
> > >>>>
> > >>>>It is because blocks with B_MANAGED flag are not queued on any queue in
> > >>>>brelse() function. Could you look at it and give us approval to merge
> > >>>>this change into HEAD?
> > >>>
> > >>>Right, but this is in fact the only function of the B_MANAGED flag.
> > >>>So the question is, what are you trying to accomplish.
> > >>
> > >>Hi Kostik,
> > >>
> > >>There are two separate issues.
> > >>First is that if we have B_MANAGED flag it should not cause the panic
> > >>when used, so in my opinion fix should go into HEAD regardless of NANDFS.
> > >>Second thing is the reason of having B_MANAGED flag. We use it because
> > >>we want to decide when and where to save buffers. Our FS is log
> > >>filesystem and we write buffers every given amount of time or if number
> > >>of dirty buffer exceed given threshold. We write buffers in large groups
> > >>along with metadata related to this group, so we cannot afford writing
> > >>single buffer.  As a result we cannot allow buf deamon to write buffers
> > >>in an ad hoc manner.
> > >>I hope I answered your question. Please let me know if you have more
> > >>concerns. If you have some ideas how we can avoid using B_MANAGED flags
> > >>please let us know.
> > >
> > >I looked at the whole B_MANAGED business over the two days. Finally I
> > >think that I agree to some extent with the idea of the patch, but I
> > >still do not like details. I put below the change that I think we are
> > >discussing.
> > >
> > >Index: vfs_bio.c
> > >===================================================================
> > >--- vfs_bio.c	(.../head/sys/kern/vfs_bio.c)	(revision 235215)
> > >+++ vfs_bio.c	(.../projects/nand/sys/kern/vfs_bio.c)	(revision 235215)
> > >@@ -2672,7 +2672,8 @@ loop:
> > >  		else if ((bp->b_flags&  (B_VMIO | B_INVAL)) == 0)
> > >  			bp->b_flags |= B_CACHE;
> > >  		BO_LOCK(bo);
> > >-		bremfree(bp);
> > >+		if (!(bp->b_flags&  B_MANAGED))
> > >+			bremfree(bp);
> > >  		BO_UNLOCK(bo);
> > >
> > >  		/*
> > >
> > >First note, you should not take BO_LOCK if you are not going to call
> > >bremfree().Second, I think you need to assert that b_qindex is already
> > >QUEUE_NONE if B_MANAGED is set.
> > >
> > >Note the comment right after the if (bp != NULL) line in getblk(). I think
> > >it needs to be adjusted to say 'if not managed' or like.
> > 
> > Yes, you are right. How about this patch:
> > 
> > http://people.freebsd.org/~gber/patches/vfs_bio.diff
> Please revert the test condition and exchange the then/else blocks.
> It is easier to read this way.
> 

Is this ok:
http://people.freebsd.org/~gber/patches/vfs_bio2.diff

?

> Still the question how xfs will react to the change stands. Xfs is the only
> current in-tree consumer of B_MANAGED, might be it is too rotten.
> 

AFAIR XFS is currently broken and not mpsafe, i.e. it needs work anyway,
so I don't think this is something we should be concerned about at the
moment.

> > >
> > >As a general note, explaining my limited disagreement with the whole idea
> > >of using managed buffers, is the question how do you react to low free
> > >buffer condition at all ? Do you get the notification on the condition at
> > >all ?
> > 
> > I agree that B_MANAGED should be used carefully, but this feature might 
> > be very useful and we should not get rid of it.
> > 
> > Indeed we do not get notifications from bufdaemon, however we have our 
> > own syncer thread which start writes when number of dirty buffer exceeds 
> > given threshold. I believe that our threshold of dirty buffers which 
> > triggers writes is much lower than the one used by buf deamon. I think 
> > that this will prevent the system from having too many dirty buffers. 
> > But see below.
> > 
> > >
> > >Current scheme involves activation of the bufdaemon when getnewbuf() cannot
> > >find a suitable buffer or KVA. With a non-trivial count of buffers not
> > >presented on any queue, system has high risk of deadlocking.
> > 
> > As noted earlier I don't think that this will be a problem with nandfs. 
> > However in case such notifications are really needed, I think we can add 
> > an EVENTHANDLER that would fire when buf daemon decides to flush 
> > buffers. Does this sound reasonable?
> 
> Note that bufdaemon cannot flush some buffers on its own, mostly the
> buffers belonging to the locked vnode. In this case, getnewbuf() caller
> threads are used as bufdaemon helpers.
> 
> Might be eventhandler calls from bufdaemon are not bad, but I want to see
> a code before making the judgement.

http://people.freebsd.org/~raj/patches/misc/vfs_highdirtybuf.diff

callbacks are expected to increase flushed counter if they happend to
flush some buffers.

Example proof-of-concept (will be cleaned up) change for nandfs:
http://people.freebsd.org/~raj/patches/misc/nandfs_vfs_highdirtybuf.diff

Does this look reasonable?

-- 
Mateusz Guzik <mjg semihalf.com>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120510164519.GA13258>