Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Jul 2011 22:11:15 +0900 (JST)
From:      Kohji Okuno <okuno.kohji@jp.panasonic.com>
To:        kostikbel@gmail.com
Cc:        attilio@freebsd.org, freebsd-current@freebsd.org, okuno.kohji@jp.panasonic.com, emaste@sandvine.com
Subject:   Re: Bug about devfs?
Message-ID:  <20110713.221115.864172011004492659.okuno.kohji@jp.panasonic.com>
In-Reply-To: <20110713.131514.2295692683054433320.okuno.kohji@jp.panasonic.com>
References:  <CAJ-FndDz31pzH2dzrqdWWAFpMm%2BOOTFCj44MBpBoCS81-S4CVg@mail.gmail.com> <20110712145753.GI43872@deviant.kiev.zoral.com.ua> <20110713.131514.2295692683054433320.okuno.kohji@jp.panasonic.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello,

> From: Kostik Belousov <kostikbel@gmail.com>
> Date: Tue, 12 Jul 2011 17:57:53 +0300
>> On Tue, Jul 12, 2011 at 03:02:44PM +0200, Attilio Rao wrote:
>>> 2011/7/12 Kostik Belousov <kostikbel@gmail.com>:
>>> > On Tue, Jul 12, 2011 at 07:10:28PM +0900, Kohji Okuno wrote:
>>> >> Hello,
>>> >>
>>> >> I think that devfs has a problem.
>>> >> I encountered the problem that open("/dev/AAA") returned ENOENT.=

>>> >> Of course, /dev/AAA exists.
>>> >>
>>> >> ENOENT was created by the point(***) in devfs_allocv().
>>> >> I think that the race condition had occurred between process A a=
nd
>>> >> vnlru kernel thread.
>>> >>
>>> >> Please check the following.
>>> >>
>>> >> If vnlru set VI_DOOMED to vp->v_iflag but vnlru didn't still exe=
cute
>>> >> VOP_RECLAIM(), process A cat get valid vp from de->de_vnode.
>>> >> But, vget() will return ENOENT, because vp->v_iflag has VI_DOOME=
D.
>>> >>
>>> >> When I set the break point to (***), I checked that de->de_vnode=
 and
>>> >> vp->v_data were NULL.
>>> >>
>>> >>
>>> >> process A: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
vnlru:
>>> >>
>>> >> devfs_allocv() {
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 vgonel(vp) {
>>> >> =A0 =A0... =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ...
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 vp->v_iflag |=3D VI_DOOMED;
>>> >> =A0 mtx_lock(&devfs_de_interlock); =A0 =A0 =A0 =A0 ...
>>> >> =A0 vp =3D de->de_vnode;
>>> >> =A0 if (vp !=3D NULL) { =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
VI_UNLOCK(vp);
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ____________=
_/ ...
>>> >> =A0 VI_LOCK(vp); ____________/ =A0 =A0 =A0 =A0 =A0 =A0 =A0if (VO=
P_RECLAIM(vp, td))
>>> >> =A0 mtx_unlock(&devfs_de_interlock); =A0 =A0 =A0 ...
>>> >> =A0 =A0... =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0\ =A0 =A0 =A0 =A0 devfs_reclaim(ap) {
>>> >> =A0 error =3D vget(vp,...); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0\
>>> >> =A0 =A0... =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0\______ =A0 mtx_lock(&devfs_de_interlock);
>>> >> =A0 if (devfs_allocv_drop_refs(...)) { =A0 =A0 =A0 =A0de =3D vp-=
>v_data;
>>> >> =A0 =A0 ... =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (de !=3D NULL) {
>>> >> =A0 } =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 de->de_vnode =3D NULL;
>>> >> =A0 else if (error) { =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 vp->v_data =3D NULL;
>>> >> =A0 =A0 ... =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>> >> =A0 =A0 rturn (error); (***) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
mtx_unlock(&devfs_de_interlock);
>>> >> =A0 } =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 ...
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 }
>>> >>
>>> >>
>>> >>
>>> >> I think that devfs_allocv() should be fixed as below.
>>> >> How do you think?
>>> >>
>>> >> devfs_allocv(struct devfs_dirent *de, struct mount *mp, struct v=
node **vpp)
>>> >> {
>>> >> =A0 =A0 =A0 =A0 int error;
>>> >> =A0 =A0 =A0 struct vnode *vp;
>>> >> =A0 =A0 =A0 struct cdev *dev;
>>> >> =A0 =A0 =A0 struct devfs_mount *dmp;
>>> >>
>>> >> =A0 =A0 =A0 dmp =3D VFSTODEVFS(mp);
>>> >> +#if 1
>>> >> + retry:
>>> >> +#endif
>>> >> =A0 =A0 =A0 =A0 if (de->de_flags & DE_DOOMED) {
>>> >>
>>> >> =A0 =A0 =A0 =A0 =A0 =A0...
>>> >>
>>> >> =A0 =A0 =A0 =A0 mtx_lock(&devfs_de_interlock);
>>> >> =A0 =A0 =A0 =A0 vp =3D de->de_vnode;
>>> >> =A0 =A0 =A0 =A0 if (vp !=3D NULL) {
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 VI_LOCK(vp);
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtx_unlock(&devfs_de_interlock);=

>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 sx_xunlock(&dmp->dm_lock);
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 error =3D vget(vp, LK_EXCLUSIVE =
| LK_INTERLOCK, curthread);
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 sx_xlock(&dmp->dm_lock);
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (devfs_allocv_drop_refs(0, dm=
p, de)) {
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (error =3D=3D=
 0)
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
vput(vp);
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (ENOENT);=

>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 else if (error) {
>>> >> +#if 1
>>> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (error =3D=3D ENOEN=
T)
>>> >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto r=
etry;
>>> >> +#endif
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 sx_xunlock(&dmp->dm_=
lock);
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return (error);
>>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>> >>
>>> > Thank you for the report.
>>> >
>>> > The proposed change would revert r179247, which also caused some =
issues.
>>> > Are you able to reproduce the problem ?
>>> >
>>> > Could you try the following patch ? I cannot reproduce your situa=
tion,
>>> > so the patch is untested by me.
>>> >
>>> > diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnop=
s.c
>>> > index bf6dab8..bbbfff4 100644
>>> > --- a/sys/fs/devfs/devfs_vnops.c
>>> > +++ b/sys/fs/devfs/devfs_vnops.c
>>> > @@ -397,6 +397,7 @@ devfs_allocv(struct devfs_dirent *de, struct =
mount *mp, int lockmode,
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sx_xunlock(&dmp->dm_lock);
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return (ENOENT);
>>> > =A0 =A0 =A0 =A0}
>>> > +loop:
>>> > =A0 =A0 =A0 =A0DEVFS_DE_HOLD(de);
>>> > =A0 =A0 =A0 =A0DEVFS_DMP_HOLD(dmp);
>>> > =A0 =A0 =A0 =A0mtx_lock(&devfs_de_interlock);
>>> > @@ -412,7 +413,16 @@ devfs_allocv(struct devfs_dirent *de, struct=
 mount *mp, int lockmode,
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vp=
ut(vp);
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return (ENOENT);
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
>>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 else if (error) {
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 else if (error !=3D 0) {
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (error =3D=3D EN=
OENT) {
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtx=
_lock(&devfs_de_interlock);
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 whi=
le (de->de_vnode !=3D NULL) {
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 msleep(&de->de_vnode,
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 &devfs_de_interlock, 0, "dvall", 0);
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 mtx=
_unlock(&devfs_de_interlock);
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 got=
o loop;
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0sx_xunlock(&dmp->d=
m_lock);
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return (error);
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
>>> > @@ -1259,6 +1269,7 @@ devfs_reclaim(struct vop_reclaim_args *ap)
>>> > =A0 =A0 =A0 =A0if (de !=3D NULL) {
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0de->de_vnode =3D NULL;
>>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vp->v_data =3D NULL;
>>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 wakeup(&de->de_vnode);
>>> > =A0 =A0 =A0 =A0}
>>> > =A0 =A0 =A0 =A0mtx_unlock(&devfs_de_interlock);
>>> =

>>> I think that this approach may starve the thread for a while.
>>> As I told you privately I was able to see on field a livelock becau=
se
>>> of this check. In my case, it was a thread running for 63seconds (a=
t
>>> least, at that point the watchdog was tripping over).
>> Feasible explanation was not found at the time, AFAIR. I could belie=
ve
>> that this is possible with r179247 and driver stuck in the close cde=
vsw
>> method.
>> =

>> More risky change would be to clear de_vnode early. All devfs code s=
hall
>> be already safe by checking for VI_DOOMED, and if VI_DOOMED, v_data =
may
>> be NULL.
>> =

>> Again, I am unable to test.
>> =

>> diff --git a/sys/fs/devfs/devfs_vnops.c b/sys/fs/devfs/devfs_vnops.c=

>> index bf6dab8..955bd8b 100644
>> --- a/sys/fs/devfs/devfs_vnops.c
>> +++ b/sys/fs/devfs/devfs_vnops.c
>> @@ -397,6 +397,7 @@ devfs_allocv(struct devfs_dirent *de, struct mou=
nt *mp, int lockmode,
>>  		sx_xunlock(&dmp->dm_lock);
>>  		return (ENOENT);
>>  	}
>> +loop:
>>  	DEVFS_DE_HOLD(de);
>>  	DEVFS_DMP_HOLD(dmp);
>>  	mtx_lock(&devfs_de_interlock);
>> @@ -405,16 +406,21 @@ devfs_allocv(struct devfs_dirent *de, struct m=
ount *mp, int lockmode,
>>  		VI_LOCK(vp);
>>  		mtx_unlock(&devfs_de_interlock);
>>  		sx_xunlock(&dmp->dm_lock);
>> -		error =3D vget(vp, lockmode | LK_INTERLOCK, curthread);
>> +		vget(vp, lockmode | LK_INTERLOCK | LK_RETRY, curthread);
>>  		sx_xlock(&dmp->dm_lock);
>>  		if (devfs_allocv_drop_refs(0, dmp, de)) {
>> -			if (error =3D=3D 0)
>> -				vput(vp);
>> +			vput(vp);
>>  			return (ENOENT);
>>  		}
>> -		else if (error) {
>> -			sx_xunlock(&dmp->dm_lock);
>> -			return (error);
>> +		else if ((vp->v_iflag & VI_DOOMED) !=3D 0) {
>> +			mtx_lock(&devfs_de_interlock);
>> +			if (de->de_vnode =3D=3D vp) {
>> +				de->de_vnode =3D NULL;
>> +				vp->v_data =3D NULL;
>> +			}
>> +			mtx_unlock(&devfs_de_interlock);
>> +			vput(vp);
>> +			goto loop;
>>  		}
>>  		sx_xunlock(&dmp->dm_lock);
>>  		*vpp =3D vp;
> =

> I tried Kostik's last patch, and I checked that open() successed twic=
e
> by retry. It is difficult for me to reproduce this situation.
> =

> At least, my problem is solved by this patch.

I checked ten times or more. And, I didn't encounter all problems.

Thanks,
 Kohji Okuno.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110713.221115.864172011004492659.okuno.kohji>