Date: Mon, 25 Jun 2018 02:04:32 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Konstantin Belousov <kostikbel@gmail.com> Cc: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Alexander Motin <mav@FreeBSD.org>, Doug Rabson <dfr@rabson.org> Subject: Re: nfsd kernel threads won't die via SIGKILL Message-ID: <YTOPR0101MB095352561BFA6428ACBB19FDDD4A0@YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <20180624093330.GX2430@kib.kiev.ua> References: <YTXPR0101MB0959B4E960B85ACAC07B819DDD740@YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM>, <20180624093330.GX2430@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote:
>On Sat, Jun 23, 2018 at 09:03:02PM +0000, Rick Macklem wrote:
>> During testing of the pNFS server I have been frequently killing/restart=
ing the nfsd.
>> Once in a while, the "slave" nfsd process doesn't terminate and a "ps ax=
Hl" shows:
>> 0 48889 1 0 20 0 5884 812 svcexit D - 0:00.01 nfsd: =
server
>> 0 48889 1 0 40 0 5884 812 rpcsvc I - 0:00.00 nfsd: =
server
>> ... more of the same
>> 0 48889 1 0 40 0 5884 812 rpcsvc I - 0:00.00 nfsd: =
server
>> 0 48889 1 0 -8 0 5884 812 rpcsvc I - 1:51.78 nfsd: =
server
>> 0 48889 1 0 -8 0 5884 812 rpcsvc I - 2:27.75 nfsd: =
server
>>
>> You can see that the top thread (the one that was created with the proce=
ss) is
>> stuck in "D" on "svcexit".
>> The rest of the threads are still servicing NFS RPCs. If you still have =
an NFS mount >>on
>> the server, the mount continues to work and the CPU time for the last tw=
o threads
>> slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the pro=
cess and
>> these threads (created by kthread_add) are here, but the
>> cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other=
>>threads.
>>
>> if (ismaster || (!ismaster &&
>> 1207 grp->sg_threadcount > grp->sg_minthrea=
ds))
>> 1208 error =3D cv_timedwait_sig(&st->st=
_cond,
>> 1209 &grp->sg_lock, 5 * hz);
>> 1210 else
>> 1211 error =3D cv_wait_sig(&st->st_cond=
,
>> 1212 &grp->sg_lock);
>>
>> The top thread (referred to in svc.c as "ismaster" did return from here =
with EINTR
>> and has now done an msleep() here, waiting for the other threads to term=
inate.
>>
>> /* Waiting for threads to stop. */
>> 1387 for (g =3D 0; g < pool->sp_groupcount; g++) {
>> 1388 grp =3D &pool->sp_groups[g];
>> 1389 mtx_lock(&grp->sg_lock);
>> 1390 while (grp->sg_threadcount > 0)
>> 1391 msleep(grp, &grp->sg_lock, 0, "svcexit", 0=
);
>> 1392 mtx_unlock(&grp->sg_lock);
>> 1393 }
>>
>> Although I can't be sure if this patch has fixed the problem because it =
happens
>> intermittently, I have not seen the problem since applying this patch:
>> --- rpc/svc.c.sav 2018-06-21 22:52:11.623955000 -0400
>> +++ rpc/svc.c 2018-06-22 09:01:40.271803000 -0400
>> @@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool)
>> grp =3D &pool->sp_groups[g];
>> mtx_lock(&grp->sg_lock);
>> while (grp->sg_threadcount > 0)
>> - msleep(grp, &grp->sg_lock, 0, "svcexit", 0);
>> + msleep(grp, &grp->sg_lock, 0, "svcexit", 1);
>> mtx_unlock(&grp->sg_lock);
>> }
>> }
>>
>> As you can see, all it does is add a timeout to the msleep().
>> I am not familiar with the signal delivery code in sleepqeue, so it prob=
ably
>> isn't correct, but my theory is alonge the lines of...
>>
>> Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR
>> and if that happens before the other threads return EINTR from cv_wait_s=
ig(),
>> they no longer do so?
>> And I thought that waking up from the msleep() via timeouts would maybe =
allow
>> the other threads to return EINTR from cv_wait_sig()?
>>
>> Does this make sense? rick
>> ps: I'll post if I see the problem again with the patch applied.
>> pss: This is a single core i386 system, just in case that might affect t=
his.
>
>No, the patch does not make sense. I think it was just coincidental that
>with the patch you did not get the hang.
>
>Signals are delivered to a thread, which should take the appropriate
>actions. For the kernel process like rpc pool, the signals are never
>delivered, they are queued in the randomly selected thread' signal queue
>and sit there. The interruptible sleeps are aborted in the context
>of that thread, but nothing else happens. So if you need to make svc
>pools properly killable, all threads must check at least for EINTR and
>instruct other threads to exit as well.
I'm not sure I understand what the "randomly selected thread signal queue" =
means,
but it seems strange that this usually works. (The code is at least 10years=
old.
Originally committed by dfr@. I've added him to the cc list in case he unde=
rstands
this?
Is it that, usually, the threads will all return EINTR before the master on=
e gets
to where the msleep() happens if the count is > 0?
>Your description at the start of the message of the behaviour after
>SIGKILL, where other threads continued to serve RPCs, exactly matches
>above explanation. You need to add some global 'stop' flag, if it is not
>yet present, and recheck it after each RPC handled. Any thread which
>notes EINTR or does a direct check for the pending signal, should set
>the flag and wake up every other thread in the pool.
Ok, I'll code up a patch with a global "stop" flag and test it for a while.
If it seems ok, I'll put it up in phabricator and ask you to review it.
Thanks, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTOPR0101MB095352561BFA6428ACBB19FDDD4A0>
