Date: Tue, 20 Nov 2012 14:13:57 +0000 From: Attilio Rao <attilio@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans <brde@optusnet.com.au> Subject: Re: svn commit: r243311 - in head/sys: fs/ext2fs fs/msdosfs fs/nfsclient fs/nullfs fs/unionfs gnu/fs/reiserfs nfsclient ufs/ffs Message-ID: <CAJ-FndCtwOMFjZSXD4Q_4FNjZqpJcZq1gx6VSXr=ALSfZ=ksmg@mail.gmail.com> In-Reply-To: <20121120141137.GX73505@kib.kiev.ua> References: <201211192243.qAJMhjFF055708@svn.freebsd.org> <20121120162708.I924@besplex.bde.org> <CAJ-FndBrxipxRDKZuPww7j1xdz_uh7Qym6GBycP2kMtBGptoFg@mail.gmail.com> <20121120230708.G6016@besplex.bde.org> <CAJ-FndDhUEHNFv5RTBUACPqOeLBAnVZ4EKZhzdjx-FK84nVLhw@mail.gmail.com> <CAJ-FndAOc4ctzSMHH7dUCX90eGwYrAZnJGXpghmv-Mm=CzkS3Q@mail.gmail.com> <20121120234822.J6178@besplex.bde.org> <CAJ-FndABr3oRv9DtPnBmbwn-UCXFrf7ZfxauWF0XGOwqiMgwxA@mail.gmail.com> <20121120141137.GX73505@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/20/12, Konstantin Belousov <kostikbel@gmail.com> wrote: > On Tue, Nov 20, 2012 at 01:27:07PM +0000, Attilio Rao wrote: >> On 11/20/12, Bruce Evans <brde@optusnet.com.au> wrote: >> > On Tue, 20 Nov 2012, Attilio Rao wrote: >> > >> >> On 11/20/12, Attilio Rao <attilio@freebsd.org> wrote: >> >>> On 11/20/12, Bruce Evans <brde@optusnet.com.au> wrote: >> >>>> On Tue, 20 Nov 2012, Attilio Rao wrote: >> >>>> >> >>>>> On Tue, Nov 20, 2012 at 6:20 AM, Bruce Evans <brde@optusnet.com.au> >> >>>>> wrote: >> >>>>>> On Mon, 19 Nov 2012, Attilio Rao wrote: >> >>>>>> >> >>>>>>> Log: >> >>>>>>> r16312 is not any longer real since many years (likely since >> >>>>>>> when >> >>>>>>> VFS >> >>>>>>> received granular locking) but the comment present in UFS has >> >>>>>>> been >> >>>>>>> copied all over other filesystems code incorrectly for several >> >>>>>>> times. >> >>>>>>> >> >>>>>>> Removes comments that makes no sense now. >> >>>>>> >> >>>>>> >> >>>>>> It still made sense (except for bitrot in the function name), but >> >>>>>> might >> >>>>>> not >> >>>>>> be true). The code made sense with it. Now the code makes no >> >>>>>> sense. >> >>>>>> >> >>>>>> >> >>>>>>> Modified: head/sys/ufs/ffs/ffs_vfsops.c >> >>>>>>> >> >>>>>>> ============================================================================== >> >>>>>>> --- head/sys/ufs/ffs/ffs_vfsops.c Mon Nov 19 21:58:14 2012 >> >>>>>>> (r243310) >> >>>>>>> +++ head/sys/ufs/ffs/ffs_vfsops.c Mon Nov 19 22:43:45 2012 >> >>>>>>> (r243311) >> >>>>>>> @@ -1676,14 +1676,6 @@ ffs_vgetf(mp, ino, flags, vpp, ffs_flags >> >>>>>>> ump = VFSTOUFS(mp); >> >>>>>>> dev = ump->um_dev; >> >>>>>>> fs = ump->um_fs; >> >>>>>>> - >> >>>>>>> - /* >> >>>>>>> - * If this malloc() is performed after the getnewvnode() >> >>>>>> >> >>>>>> >> >>>>>> This malloc() didn't match the code, which uses uma_zalloc(). Old >> >>>>>> versions used MALLOC() in both the comment and the code. ffs's >> >>>>>> comment >> >>>>>> was updated to say malloc() when the code was changed to use >> >>>>>> malloc(), >> >>>>>> then rotted when the code was changed to use uma_zalloc(). In >> >>>>>> some >> >>>>>> other file systems, the comment still said MALLOC(). >> >>>>>> >> >>>>>> >> >>>>>>> - * it might block, leaving a vnode with a NULL v_data to >> >>>>>>> be >> >>>>>>> - * found by ffs_sync() if a sync happens to fire right >> >>>>>>> then, >> >>>>>>> - * which will cause a panic because ffs_sync() blindly >> >>>>>>> - * dereferences vp->v_data (as well it should). >> >>>>>>> - */ >> >>>>>>> ip = uma_zalloc(uma_inode, M_WAITOK | M_ZERO); >> >>>>>>> >> >>>>>>> /* Allocate a new vnode/inode. */ >> >>>>>>> >> >>>>>> >> >>>>>> The code makes no sense now. The comment explains why ip is >> >>>>>> allocated >> >>>>>> before vp, instead of in the natural, opposite order like it used >> >>>>>> to >> >>>>>> be. Allocating things in an unnatural order requires extra code >> >>>>>> to >> >>>>>> free ip when the allocation of vp fails. >> >>>>> >> >>>>> "Used to be" is very arguably. The code has been like its current >> >>>>> form >> >>>>> many more years than the opposite (16 against 3 I think). >> >>>>> And the code makes perfectly sense if you know the history. So I >> >>>>> don't >> >>>>> agree with you. >> >>>> >> >>>> But it shouldn't be necessary to know the history of the code to >> >>>> understand it. The code only makes sense if its comment is not >> >>>> removed, >> >>>> or if you know the history of the code so that you can restore the >> >>>> removed comment. However, if the comment makes no sense as you >> >>>> claim, >> >>>> then the code that it it describes makes no sense. >> >>> >> >>> The "code that makes no sense" is basically the justification to have >> >>> the allocation before the getnewvnode(). It makes no sense because >> >>> the >> >>> order makes no sense (you can allocate before or after getnewvnode(), >> >>> you won't have v_data corruption as the comment claims). >> >>> >> >>> Hence the code makes no sense. >> >> >> >> Herm, s/code/comment. >> > >> > I think you are right that the comment makes no sense. A preemptible >> > kernel may be preempted without it calling malloc() where it may block. >> > Thus ffs_fsync() and anything else that looks at the inode must be >> > doing something to avoid dereferencing v_data if the vnode is not fully >> > constructed. This seems to be done by iterating over vnodes using >> > MNT_VNODE_FOREACH_ACTIVE*() and not making incomplete vnodes active. >> > ffs_fsync() still just blindly dereferences the inode. >> > >> > I think I am right that the code makes no sense. It is ordered like >> > it is because placing the allocation of ip before the allocation of >> > vp used to be enough to prevent v_data being dereferenced. This makes >> > no sense when it isn't enough. >> >> In the past, before VFS got locking and kernel was single-threaded, >> the comment and code arranged in this way were sensitive and >> effective. >> As now this is not true anymore, there is no strict relationship >> between the getnewvnode() and sleeping points. >> It is important to remove stale comments because they confuse people, >> the porters and as you can see the code/comment has been cut&paste >> quite a bit around. >> > > Putting the issue of the comment making no sense at all aside, which is > true but tangential to what I want to note. > > Although malloc(9) is not ordered with the vnode locks in the lock > order, it is still good practice to not perform allocations under the > vnode lock. The reason is that pagedaemon might need to clean and write > pages, in particular, belonging to the vnodes. Generally, if you own > vnode lock and do something which might kick pagedaemon, you are creating > troubles for pagedaemon, which would attempt to lock the vnode with > timeout, spending the precious time waiting for lock it cannot obtain. > > This is less an issue for the new vnode instantiation locations, because > vnode cannot have resident pages yet. But it is good to follow the same > pattern anywhere. Yes, I completely agree. Infact the code is not changed also to avoid pagedaemon hosing. However this is a different situation respect a "if you allocate while you hold the lock you will get a deadlock/race/corruption". Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndCtwOMFjZSXD4Q_4FNjZqpJcZq1gx6VSXr=ALSfZ=ksmg>