Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Nov 2012 14:13:57 +0000
From:      Attilio Rao <attilio@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r243311 - in head/sys: fs/ext2fs fs/msdosfs fs/nfsclient fs/nullfs fs/unionfs gnu/fs/reiserfs nfsclient ufs/ffs
Message-ID:  <CAJ-FndCtwOMFjZSXD4Q_4FNjZqpJcZq1gx6VSXr=ALSfZ=ksmg@mail.gmail.com>
In-Reply-To: <20121120141137.GX73505@kib.kiev.ua>
References:  <201211192243.qAJMhjFF055708@svn.freebsd.org> <20121120162708.I924@besplex.bde.org> <CAJ-FndBrxipxRDKZuPww7j1xdz_uh7Qym6GBycP2kMtBGptoFg@mail.gmail.com> <20121120230708.G6016@besplex.bde.org> <CAJ-FndDhUEHNFv5RTBUACPqOeLBAnVZ4EKZhzdjx-FK84nVLhw@mail.gmail.com> <CAJ-FndAOc4ctzSMHH7dUCX90eGwYrAZnJGXpghmv-Mm=CzkS3Q@mail.gmail.com> <20121120234822.J6178@besplex.bde.org> <CAJ-FndABr3oRv9DtPnBmbwn-UCXFrf7ZfxauWF0XGOwqiMgwxA@mail.gmail.com> <20121120141137.GX73505@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/20/12, Konstantin Belousov <kostikbel@gmail.com> wrote:
> On Tue, Nov 20, 2012 at 01:27:07PM +0000, Attilio Rao wrote:
>> On 11/20/12, Bruce Evans <brde@optusnet.com.au> wrote:
>> > On Tue, 20 Nov 2012, Attilio Rao wrote:
>> >
>> >> On 11/20/12, Attilio Rao <attilio@freebsd.org> wrote:
>> >>> On 11/20/12, Bruce Evans <brde@optusnet.com.au> wrote:
>> >>>> On Tue, 20 Nov 2012, Attilio Rao wrote:
>> >>>>
>> >>>>> On Tue, Nov 20, 2012 at 6:20 AM, Bruce Evans <brde@optusnet.com.au>
>> >>>>> wrote:
>> >>>>>> On Mon, 19 Nov 2012, Attilio Rao wrote:
>> >>>>>>
>> >>>>>>> Log:
>> >>>>>>>  r16312 is not any longer real since many years (likely since
>> >>>>>>> when
>> >>>>>>> VFS
>> >>>>>>>  received granular locking) but the comment present in UFS has
>> >>>>>>> been
>> >>>>>>>  copied all over other filesystems code incorrectly for several
>> >>>>>>> times.
>> >>>>>>>
>> >>>>>>>  Removes comments that makes no sense now.
>> >>>>>>
>> >>>>>>
>> >>>>>> It still made sense (except for bitrot in the function name), but
>> >>>>>> might
>> >>>>>> not
>> >>>>>> be true).  The code made sense with it.  Now the code makes no
>> >>>>>> sense.
>> >>>>>>
>> >>>>>>
>> >>>>>>> Modified: head/sys/ufs/ffs/ffs_vfsops.c
>> >>>>>>>
>> >>>>>>> ==============================================================================
>> >>>>>>> --- head/sys/ufs/ffs/ffs_vfsops.c       Mon Nov 19 21:58:14 2012
>> >>>>>>> (r243310)
>> >>>>>>> +++ head/sys/ufs/ffs/ffs_vfsops.c       Mon Nov 19 22:43:45 2012
>> >>>>>>> (r243311)
>> >>>>>>> @@ -1676,14 +1676,6 @@ ffs_vgetf(mp, ino, flags, vpp, ffs_flags
>> >>>>>>>         ump = VFSTOUFS(mp);
>> >>>>>>>         dev = ump->um_dev;
>> >>>>>>>         fs = ump->um_fs;
>> >>>>>>> -
>> >>>>>>> -       /*
>> >>>>>>> -        * If this malloc() is performed after the getnewvnode()
>> >>>>>>
>> >>>>>>
>> >>>>>> This malloc() didn't match the code, which uses uma_zalloc().  Old
>> >>>>>> versions used MALLOC() in both the comment and the code.  ffs's
>> >>>>>> comment
>> >>>>>> was updated to say malloc() when the code was changed to use
>> >>>>>> malloc(),
>> >>>>>> then rotted when the code was changed to use uma_zalloc().  In
>> >>>>>> some
>> >>>>>> other file systems, the comment still said MALLOC().
>> >>>>>>
>> >>>>>>
>> >>>>>>> -        * it might block, leaving a vnode with a NULL v_data to
>> >>>>>>> be
>> >>>>>>> -        * found by ffs_sync() if a sync happens to fire right
>> >>>>>>> then,
>> >>>>>>> -        * which will cause a panic because ffs_sync() blindly
>> >>>>>>> -        * dereferences vp->v_data (as well it should).
>> >>>>>>> -        */
>> >>>>>>>         ip = uma_zalloc(uma_inode, M_WAITOK | M_ZERO);
>> >>>>>>>
>> >>>>>>>         /* Allocate a new vnode/inode. */
>> >>>>>>>
>> >>>>>>
>> >>>>>> The code makes no sense now.  The comment explains why ip is
>> >>>>>> allocated
>> >>>>>> before vp, instead of in the natural, opposite order like it used
>> >>>>>> to
>> >>>>>> be.  Allocating things in an unnatural  order requires extra code
>> >>>>>> to
>> >>>>>> free ip when the allocation of vp fails.
>> >>>>>
>> >>>>> "Used to be" is very arguably. The code has been like its current
>> >>>>> form
>> >>>>> many more years than the opposite (16 against 3 I think).
>> >>>>> And the code makes perfectly sense if you know the history. So I
>> >>>>> don't
>> >>>>> agree with you.
>> >>>>
>> >>>> But it shouldn't be necessary to know the history of the code to
>> >>>> understand it.  The code only makes sense if its comment is not
>> >>>> removed,
>> >>>> or if you know the history of the code so that you can restore the
>> >>>> removed comment.  However, if the comment makes no sense as you
>> >>>> claim,
>> >>>> then the code that it it describes makes no sense.
>> >>>
>> >>> The "code that makes no sense" is basically the justification to have
>> >>> the allocation before the getnewvnode(). It makes no sense because
>> >>> the
>> >>> order makes no sense (you can allocate before or after getnewvnode(),
>> >>> you won't have v_data corruption as the comment claims).
>> >>>
>> >>> Hence the code makes no sense.
>> >>
>> >> Herm, s/code/comment.
>> >
>> > I think you are right that the comment makes no sense.  A preemptible
>> > kernel may be preempted without it calling malloc() where it may block.
>> > Thus ffs_fsync() and anything else that looks at the inode must be
>> > doing something to avoid dereferencing v_data if the vnode is not fully
>> > constructed.  This seems to be done by iterating over vnodes using
>> > MNT_VNODE_FOREACH_ACTIVE*() and not making incomplete vnodes active.
>> > ffs_fsync() still just blindly dereferences the inode.
>> >
>> > I think I am right that the code makes no sense.  It is ordered like
>> > it is because placing the allocation of ip before the allocation of
>> > vp used to be enough to prevent v_data being dereferenced.  This makes
>> > no sense when it isn't enough.
>>
>> In the past, before VFS got locking and kernel was single-threaded,
>> the comment and code arranged in this way were sensitive and
>> effective.
>> As now this is not true anymore, there is no strict relationship
>> between the getnewvnode() and sleeping points.
>> It is important to remove stale comments because they confuse people,
>> the porters and as you can see the code/comment has been cut&paste
>> quite a bit around.
>>
>
> Putting the issue of the comment making no sense at all aside, which is
> true but tangential to what I want to note.
>
> Although malloc(9) is not ordered with the vnode locks in the lock
> order, it is still good practice to not perform allocations under the
> vnode lock. The reason is that pagedaemon might need to clean and write
> pages, in particular, belonging to the vnodes. Generally, if you own
> vnode lock and do something which might kick pagedaemon, you are creating
> troubles for pagedaemon, which would attempt to lock the vnode with
> timeout, spending the precious time waiting for lock it cannot obtain.
>
> This is less an issue for the new vnode instantiation locations, because
> vnode cannot have resident pages yet. But it is good to follow the same
> pattern anywhere.

Yes, I completely agree. Infact the code is not changed also to avoid
pagedaemon hosing.
However this is a different situation respect a "if you allocate while
you hold the lock you will get a deadlock/race/corruption".

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndCtwOMFjZSXD4Q_4FNjZqpJcZq1gx6VSXr=ALSfZ=ksmg>