Date: Thu, 10 May 2012 18:45:19 +0200 From: Mateusz Guzik <mjg@semihalf.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: svn-src-projects@freebsd.org, Grzegorz Bernacki <gber@freebsd.org>, src-committers@freebsd.org Subject: Re: svn commit: r233072 - projects/nand/sys/kern Message-ID: <20120510164519.GA13258@pcbsd-2342.semihalf.com> In-Reply-To: <20120510115857.GH2358@deviant.kiev.zoral.com.ua> References: <201203170318.q2H3ITdI047893@svn.freebsd.org> <20120317085116.GC1340@garage.freebsd.pl> <20120317161050.GI75778@deviant.kiev.zoral.com.ua> <4FA8FFB9.7090002@freebsd.org> <20120508095631.GV2358@deviant.kiev.zoral.com.ua> <4FA94609.3060306@freebsd.org> <20120510103105.GG2358@deviant.kiev.zoral.com.ua> <4FABC64F.3060502@freebsd.org> <20120510115857.GH2358@deviant.kiev.zoral.com.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 10, 2012 at 02:58:57PM +0300, Konstantin Belousov wrote: > On Thu, May 10, 2012 at 03:44:47PM +0200, Grzegorz Bernacki wrote: > > On 05/10/12 12:31, Konstantin Belousov wrote: > > >On Tue, May 08, 2012 at 06:12:57PM +0200, Grzegorz Bernacki wrote: > > >>On 05/08/12 11:56, Konstantin Belousov wrote: > > >>>On Tue, May 08, 2012 at 01:12:57PM +0200, Grzegorz Bernacki wrote: > > >>>>On 03/17/12 17:10, Konstantin Belousov wrote: > > >>>>>On Sat, Mar 17, 2012 at 09:51:16AM +0100, Pawel Jakub Dawidek wrote: > > >>>>>>On Sat, Mar 17, 2012 at 03:18:29AM +0000, Grzegorz Bernacki wrote: > > >>>>>>>Author: gber > > >>>>>>>Date: Sat Mar 17 03:18:28 2012 > > >>>>>>>New Revision: 233072 > > >>>>>>>URL: http://svn.freebsd.org/changeset/base/233072 > > >>>>>>> > > >>>>>>>Log: > > >>>>>>> Add VFS changes necessary for NANDFS to work. > > >>>>>>> > > >>>>>>> Ignore B_MANAGED buffer by syncer and ignore signal when msleep as > > >>>>>>> it > > >>>>>>> can cause file system inconsistency. > > >>>>>> > > >>>>>>I'd suggest running these changes through kib@. Especially > > >>>>>>vn_start_write() > > >>>>>>change below looks ugly, but maybe it is only temporary? > > >>>>>It is not only ugly (and object against it). > > >>>>> > > >>>>>If the change makes any difference for the filesystem, then I just > > >>>>>argue > > >>>>>that the filesystem is broken. The vn_start_write() is done on the > > >>>>>VFS entry peripheral, long before filesystem code is hit. > > >>>>> > > >>>>>I did not looked at the managed changes, you would need to describe > > >>>>>what is wrong with current code and what is the purpose of the changes. > > >>>>>B_MANAGED came from xfs, it seems, or at least xfs is the only current > > >>>>>consumer of B_MANAGED buffers. > > >>>> > > >>>>Hi Kostik, > > >>>> > > >>>>Without our change in getblk() whenewer we allocate new block we get > > >>>>panic: > > >>>> > > >>>>panic: bremfree: buffer 0xffffff807bf86080 not on a queue. > > >>>> > > >>>>It is because blocks with B_MANAGED flag are not queued on any queue in > > >>>>brelse() function. Could you look at it and give us approval to merge > > >>>>this change into HEAD? > > >>> > > >>>Right, but this is in fact the only function of the B_MANAGED flag. > > >>>So the question is, what are you trying to accomplish. > > >> > > >>Hi Kostik, > > >> > > >>There are two separate issues. > > >>First is that if we have B_MANAGED flag it should not cause the panic > > >>when used, so in my opinion fix should go into HEAD regardless of NANDFS. > > >>Second thing is the reason of having B_MANAGED flag. We use it because > > >>we want to decide when and where to save buffers. Our FS is log > > >>filesystem and we write buffers every given amount of time or if number > > >>of dirty buffer exceed given threshold. We write buffers in large groups > > >>along with metadata related to this group, so we cannot afford writing > > >>single buffer. As a result we cannot allow buf deamon to write buffers > > >>in an ad hoc manner. > > >>I hope I answered your question. Please let me know if you have more > > >>concerns. If you have some ideas how we can avoid using B_MANAGED flags > > >>please let us know. > > > > > >I looked at the whole B_MANAGED business over the two days. Finally I > > >think that I agree to some extent with the idea of the patch, but I > > >still do not like details. I put below the change that I think we are > > >discussing. > > > > > >Index: vfs_bio.c > > >=================================================================== > > >--- vfs_bio.c (.../head/sys/kern/vfs_bio.c) (revision 235215) > > >+++ vfs_bio.c (.../projects/nand/sys/kern/vfs_bio.c) (revision 235215) > > >@@ -2672,7 +2672,8 @@ loop: > > > else if ((bp->b_flags& (B_VMIO | B_INVAL)) == 0) > > > bp->b_flags |= B_CACHE; > > > BO_LOCK(bo); > > >- bremfree(bp); > > >+ if (!(bp->b_flags& B_MANAGED)) > > >+ bremfree(bp); > > > BO_UNLOCK(bo); > > > > > > /* > > > > > >First note, you should not take BO_LOCK if you are not going to call > > >bremfree().Second, I think you need to assert that b_qindex is already > > >QUEUE_NONE if B_MANAGED is set. > > > > > >Note the comment right after the if (bp != NULL) line in getblk(). I think > > >it needs to be adjusted to say 'if not managed' or like. > > > > Yes, you are right. How about this patch: > > > > http://people.freebsd.org/~gber/patches/vfs_bio.diff > Please revert the test condition and exchange the then/else blocks. > It is easier to read this way. > Is this ok: http://people.freebsd.org/~gber/patches/vfs_bio2.diff ? > Still the question how xfs will react to the change stands. Xfs is the only > current in-tree consumer of B_MANAGED, might be it is too rotten. > AFAIR XFS is currently broken and not mpsafe, i.e. it needs work anyway, so I don't think this is something we should be concerned about at the moment. > > > > > >As a general note, explaining my limited disagreement with the whole idea > > >of using managed buffers, is the question how do you react to low free > > >buffer condition at all ? Do you get the notification on the condition at > > >all ? > > > > I agree that B_MANAGED should be used carefully, but this feature might > > be very useful and we should not get rid of it. > > > > Indeed we do not get notifications from bufdaemon, however we have our > > own syncer thread which start writes when number of dirty buffer exceeds > > given threshold. I believe that our threshold of dirty buffers which > > triggers writes is much lower than the one used by buf deamon. I think > > that this will prevent the system from having too many dirty buffers. > > But see below. > > > > > > > >Current scheme involves activation of the bufdaemon when getnewbuf() cannot > > >find a suitable buffer or KVA. With a non-trivial count of buffers not > > >presented on any queue, system has high risk of deadlocking. > > > > As noted earlier I don't think that this will be a problem with nandfs. > > However in case such notifications are really needed, I think we can add > > an EVENTHANDLER that would fire when buf daemon decides to flush > > buffers. Does this sound reasonable? > > Note that bufdaemon cannot flush some buffers on its own, mostly the > buffers belonging to the locked vnode. In this case, getnewbuf() caller > threads are used as bufdaemon helpers. > > Might be eventhandler calls from bufdaemon are not bad, but I want to see > a code before making the judgement. http://people.freebsd.org/~raj/patches/misc/vfs_highdirtybuf.diff callbacks are expected to increase flushed counter if they happend to flush some buffers. Example proof-of-concept (will be cleaned up) change for nandfs: http://people.freebsd.org/~raj/patches/misc/nandfs_vfs_highdirtybuf.diff Does this look reasonable? -- Mateusz Guzik <mjg semihalf.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120510164519.GA13258>