Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Nov 2012 01:57:45 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r243311 - in head/sys: fs/ext2fs fs/msdosfs fs/nfsclient fs/nullfs fs/unionfs gnu/fs/reiserfs nfsclient ufs/ffs
Message-ID:  <20121121012935.W6568@besplex.bde.org>
In-Reply-To: <CAJ-FndCtwOMFjZSXD4Q_4FNjZqpJcZq1gx6VSXr=ALSfZ=ksmg@mail.gmail.com>
References:  <201211192243.qAJMhjFF055708@svn.freebsd.org> <20121120162708.I924@besplex.bde.org> <CAJ-FndBrxipxRDKZuPww7j1xdz_uh7Qym6GBycP2kMtBGptoFg@mail.gmail.com> <20121120230708.G6016@besplex.bde.org> <CAJ-FndDhUEHNFv5RTBUACPqOeLBAnVZ4EKZhzdjx-FK84nVLhw@mail.gmail.com> <CAJ-FndAOc4ctzSMHH7dUCX90eGwYrAZnJGXpghmv-Mm=CzkS3Q@mail.gmail.com> <20121120234822.J6178@besplex.bde.org> <CAJ-FndABr3oRv9DtPnBmbwn-UCXFrf7ZfxauWF0XGOwqiMgwxA@mail.gmail.com> <20121120141137.GX73505@kib.kiev.ua> <CAJ-FndCtwOMFjZSXD4Q_4FNjZqpJcZq1gx6VSXr=ALSfZ=ksmg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 20 Nov 2012, Attilio Rao wrote:

> On 11/20/12, Konstantin Belousov <kostikbel@gmail.com> wrote:
>> On Tue, Nov 20, 2012 at 01:27:07PM +0000, Attilio Rao wrote:
>>> On 11/20/12, Bruce Evans <brde@optusnet.com.au> wrote:
>>>> ...
>>>> I think I am right that the code makes no sense.  It is ordered like
>>>> it is because placing the allocation of ip before the allocation of
>>>> vp used to be enough to prevent v_data being dereferenced.  This makes
>>>> no sense when it isn't enough.
>>>
>>> In the past, before VFS got locking and kernel was single-threaded,
>>> the comment and code arranged in this way were sensitive and
>>> effective.
>>> As now this is not true anymore, there is no strict relationship
>>> between the getnewvnode() and sleeping points.
>>> It is important to remove stale comments because they confuse people,
>>> the porters and as you can see the code/comment has been cut&paste
>>> quite a bit around.
>>
>> Putting the issue of the comment making no sense at all aside, which is
>> true but tangential to what I want to note.
>>
>> Although malloc(9) is not ordered with the vnode locks in the lock
>> order, it is still good practice to not perform allocations under the
>> vnode lock. The reason is that pagedaemon might need to clean and write
>> pages, in particular, belonging to the vnodes. Generally, if you own
>> vnode lock and do something which might kick pagedaemon, you are creating
>> troubles for pagedaemon, which would attempt to lock the vnode with
>> timeout, spending the precious time waiting for lock it cannot obtain.
>>
>> This is less an issue for the new vnode instantiation locations, because
>> vnode cannot have resident pages yet. But it is good to follow the same
>> pattern anywhere.

The comment could have been changed to one about this.  Except it doesn't
really apply here.  Neither of the allocations is onder the vnode lock.

> Yes, I completely agree. Infact the code is not changed also to avoid
> pagedaemon hosing.
> However this is a different situation respect a "if you allocate while
> you hold the lock you will get a deadlock/race/corruption".

The code is still nonsense since its order is unrelated to this too :-).

The natural code and order is:

     o // no lock held here
     o allocate vp
     o allocate ip and any other stuff needed to complete vp
     o aquire exclusive lock
     o check that vp, ip and other stuff didn't go away
     o wire ip and other stuff into vp // shouldn't do more allocations here
     o release lock on return (?)

but the current order for at least ffs is:

     o // no lock held here
     o // oops, actually we always (?) hold at least a shared lock here.  Often
       // we need to promote to an exclusive lock.
     o allocate ip
     o allocate vp
     o aquire exclusive lock
        // no check that nothing went away??  A comment earlier still says that
        // we intentionally allow races earlier.  Is that correct when we hold
        // at leasr a shared lock throughout?
     o wire ip into vp.  Includes futher initializations of both.  Hopefully
        these can't block.
     bread() the inode.  Can block now.  No worse than blocking for write(2) (?).
     o further zalloc()s for dinodes.  Breaks kib's rule.  It would be painful
       to have to release all the zalloc()ated data on earlier failures.  At
       this point, we can still fail, but have initialized enough of the vnode
       to just fail without cleaning it up -- ufs_reclaim() will clean it.
     o further initializations, mostly not involving operations that can block.
     release lock on return (?)

The allocation of ip is clearly special -- it contains state info that
hopefully records the state of the initialization, and it must be wired into
vp first so that desrtructors can see this info.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121121012935.W6568>