Date: Mon, 25 Jun 2018 02:04:32 +0000 From: Rick Macklem <rmacklem@uoguelph.ca> To: Konstantin Belousov <kostikbel@gmail.com> Cc: "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Alexander Motin <mav@FreeBSD.org>, Doug Rabson <dfr@rabson.org> Subject: Re: nfsd kernel threads won't die via SIGKILL Message-ID: <YTOPR0101MB095352561BFA6428ACBB19FDDD4A0@YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM> In-Reply-To: <20180624093330.GX2430@kib.kiev.ua> References: <YTXPR0101MB0959B4E960B85ACAC07B819DDD740@YTXPR0101MB0959.CANPRD01.PROD.OUTLOOK.COM>, <20180624093330.GX2430@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote: >On Sat, Jun 23, 2018 at 09:03:02PM +0000, Rick Macklem wrote: >> During testing of the pNFS server I have been frequently killing/restart= ing the nfsd. >> Once in a while, the "slave" nfsd process doesn't terminate and a "ps ax= Hl" shows: >> 0 48889 1 0 20 0 5884 812 svcexit D - 0:00.01 nfsd: = server >> 0 48889 1 0 40 0 5884 812 rpcsvc I - 0:00.00 nfsd: = server >> ... more of the same >> 0 48889 1 0 40 0 5884 812 rpcsvc I - 0:00.00 nfsd: = server >> 0 48889 1 0 -8 0 5884 812 rpcsvc I - 1:51.78 nfsd: = server >> 0 48889 1 0 -8 0 5884 812 rpcsvc I - 2:27.75 nfsd: = server >> >> You can see that the top thread (the one that was created with the proce= ss) is >> stuck in "D" on "svcexit". >> The rest of the threads are still servicing NFS RPCs. If you still have = an NFS mount >>on >> the server, the mount continues to work and the CPU time for the last tw= o threads >> slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the pro= cess and >> these threads (created by kthread_add) are here, but the >> cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other= >>threads. >> >> if (ismaster || (!ismaster && >> 1207 grp->sg_threadcount > grp->sg_minthrea= ds)) >> 1208 error =3D cv_timedwait_sig(&st->st= _cond, >> 1209 &grp->sg_lock, 5 * hz); >> 1210 else >> 1211 error =3D cv_wait_sig(&st->st_cond= , >> 1212 &grp->sg_lock); >> >> The top thread (referred to in svc.c as "ismaster" did return from here = with EINTR >> and has now done an msleep() here, waiting for the other threads to term= inate. >> >> /* Waiting for threads to stop. */ >> 1387 for (g =3D 0; g < pool->sp_groupcount; g++) { >> 1388 grp =3D &pool->sp_groups[g]; >> 1389 mtx_lock(&grp->sg_lock); >> 1390 while (grp->sg_threadcount > 0) >> 1391 msleep(grp, &grp->sg_lock, 0, "svcexit", 0= ); >> 1392 mtx_unlock(&grp->sg_lock); >> 1393 } >> >> Although I can't be sure if this patch has fixed the problem because it = happens >> intermittently, I have not seen the problem since applying this patch: >> --- rpc/svc.c.sav 2018-06-21 22:52:11.623955000 -0400 >> +++ rpc/svc.c 2018-06-22 09:01:40.271803000 -0400 >> @@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool) >> grp =3D &pool->sp_groups[g]; >> mtx_lock(&grp->sg_lock); >> while (grp->sg_threadcount > 0) >> - msleep(grp, &grp->sg_lock, 0, "svcexit", 0); >> + msleep(grp, &grp->sg_lock, 0, "svcexit", 1); >> mtx_unlock(&grp->sg_lock); >> } >> } >> >> As you can see, all it does is add a timeout to the msleep(). >> I am not familiar with the signal delivery code in sleepqeue, so it prob= ably >> isn't correct, but my theory is alonge the lines of... >> >> Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR >> and if that happens before the other threads return EINTR from cv_wait_s= ig(), >> they no longer do so? >> And I thought that waking up from the msleep() via timeouts would maybe = allow >> the other threads to return EINTR from cv_wait_sig()? >> >> Does this make sense? rick >> ps: I'll post if I see the problem again with the patch applied. >> pss: This is a single core i386 system, just in case that might affect t= his. > >No, the patch does not make sense. I think it was just coincidental that >with the patch you did not get the hang. > >Signals are delivered to a thread, which should take the appropriate >actions. For the kernel process like rpc pool, the signals are never >delivered, they are queued in the randomly selected thread' signal queue >and sit there. The interruptible sleeps are aborted in the context >of that thread, but nothing else happens. So if you need to make svc >pools properly killable, all threads must check at least for EINTR and >instruct other threads to exit as well. I'm not sure I understand what the "randomly selected thread signal queue" = means, but it seems strange that this usually works. (The code is at least 10years= old. Originally committed by dfr@. I've added him to the cc list in case he unde= rstands this? Is it that, usually, the threads will all return EINTR before the master on= e gets to where the msleep() happens if the count is > 0? >Your description at the start of the message of the behaviour after >SIGKILL, where other threads continued to serve RPCs, exactly matches >above explanation. You need to add some global 'stop' flag, if it is not >yet present, and recheck it after each RPC handled. Any thread which >notes EINTR or does a direct check for the pending signal, should set >the flag and wake up every other thread in the pool. Ok, I'll code up a patch with a global "stop" flag and test it for a while. If it seems ok, I'll put it up in phabricator and ask you to review it. Thanks, rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTOPR0101MB095352561BFA6428ACBB19FDDD4A0>