From owner-freebsd-current@freebsd.org Mon Jun 25 02:04:35 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2609B101F1EC for ; Mon, 25 Jun 2018 02:04:35 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660074.outbound.protection.outlook.com [40.107.66.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "Microsoft IT TLS CA 4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 71C6C85CD1; Mon, 25 Jun 2018 02:04:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM (52.132.44.24) by YTOPR0101MB1564.CANPRD01.PROD.OUTLOOK.COM (52.132.50.157) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.884.24; Mon, 25 Jun 2018 02:04:32 +0000 Received: from YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM ([fe80::d0eb:3783:7c99:2802]) by YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM ([fe80::d0eb:3783:7c99:2802%4]) with mapi id 15.20.0884.023; Mon, 25 Jun 2018 02:04:32 +0000 From: Rick Macklem To: Konstantin Belousov CC: "freebsd-current@freebsd.org" , Alexander Motin , Doug Rabson Subject: Re: nfsd kernel threads won't die via SIGKILL Thread-Topic: nfsd kernel threads won't die via SIGKILL Thread-Index: AQHUCzMbOoi8yQKY7UygDXm1hs0B0KRvJmwAgAESAI8= Date: Mon, 25 Jun 2018 02:04:32 +0000 Message-ID: References: , <20180624093330.GX2430@kib.kiev.ua> In-Reply-To: <20180624093330.GX2430@kib.kiev.ua> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YTOPR0101MB1564; 7:brXuicSweFWNNqIQmOjLI8GAzAzshh5Unb8pijvapsU+gSSgTZ9k+r6BHdX6qgh6QG3xoNLx0xNhmEsUrd9QEMelzrkeQQkaJ8MhlzPT5UArrjNmQaxl9JDhQZNgn7NIpLXC7wi/QEBBHEqOwAqwx4Hhs+29tb7dcYgHOeF1fMhoZFF8qArnQ6WSQgIgJERRS9lX/YUA6FqqvsU41x1x26SEXGj4AlftxsK/jGpOmSoerRJQSYoqSv1LrWNVWEab x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 446d1d13-a381-4c90-7bd1-08d5da3ffe02 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(8989117)(4534165)(4627221)(201703031133081)(201702281549075)(8990107)(5600026)(711020)(2017052603328)(7153060)(7193020); SRVR:YTOPR0101MB1564; x-ms-traffictypediagnostic: YTOPR0101MB1564: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3002001)(3231254)(944501410)(52105095)(149027)(150027)(6041310)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123560045)(20161123564045)(20161123558120)(6072148)(201708071742011)(7699016); SRVR:YTOPR0101MB1564; BCL:0; PCL:0; RULEID:; SRVR:YTOPR0101MB1564; x-forefront-prvs: 0714841678 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39860400002)(346002)(396003)(366004)(376002)(39380400002)(199004)(189003)(74316002)(54906003)(6506007)(3280700002)(8676002)(81156014)(74482002)(59450400001)(186003)(6916009)(26005)(102836004)(8936002)(81166006)(5660300001)(7696005)(2906002)(68736007)(2900100001)(76176011)(786003)(99286004)(105586002)(25786009)(3660700001)(106356001)(305945005)(5250100002)(316002)(55016002)(86362001)(229853002)(6436002)(4326008)(39060400002)(1411001)(14454004)(53936002)(6246003)(11346002)(478600001)(446003)(33656002)(97736004)(9686003)(486006)(476003); DIR:OUT; SFP:1101; SCL:1; SRVR:YTOPR0101MB1564; H:YTOPR0101MB0953.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: nGydpW4wvgInn01n0waepQMOdzhA50U/TQpfJSCIZ1tKZHdKQy7B1iCPQnun/cIZC1po2gg7FJtxk2gTfL9aPTBypFaV4zULxQaY0d5jqHVy+pqOpnUe8/UJEn5wqGVkWriCQTE4mU+nJoFEsBBjvkGp0u/EHuL7nkcWU0xafIHEiXLGaZ+D4fgD0kkRKCf6h8VvKkAvN92mmpPYhYAOBvS5OPXN/tPr6n6V3GW3+6ush1DjSqANx18t2p4nSBIWKqIBwv4QiuLJrR3EqPg8jjXcuwYz8GZ5nfYaZ1zdf0mu7mNGR96aS93WyYlOLMN2fSkr9wSsfq8H9kvOf6cFiNGKb3XnwfMNS/EGJtY3Av8= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 446d1d13-a381-4c90-7bd1-08d5da3ffe02 X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Jun 2018 02:04:32.1152 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR0101MB1564 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jun 2018 02:04:35 -0000 Konstantin Belousov wrote: >On Sat, Jun 23, 2018 at 09:03:02PM +0000, Rick Macklem wrote: >> During testing of the pNFS server I have been frequently killing/restart= ing the nfsd. >> Once in a while, the "slave" nfsd process doesn't terminate and a "ps ax= Hl" shows: >> 0 48889 1 0 20 0 5884 812 svcexit D - 0:00.01 nfsd: = server >> 0 48889 1 0 40 0 5884 812 rpcsvc I - 0:00.00 nfsd: = server >> ... more of the same >> 0 48889 1 0 40 0 5884 812 rpcsvc I - 0:00.00 nfsd: = server >> 0 48889 1 0 -8 0 5884 812 rpcsvc I - 1:51.78 nfsd: = server >> 0 48889 1 0 -8 0 5884 812 rpcsvc I - 2:27.75 nfsd: = server >> >> You can see that the top thread (the one that was created with the proce= ss) is >> stuck in "D" on "svcexit". >> The rest of the threads are still servicing NFS RPCs. If you still have = an NFS mount >>on >> the server, the mount continues to work and the CPU time for the last tw= o threads >> slowly climbs, due to NFS RPC activity. A SIGKILL was posted for the pro= cess and >> these threads (created by kthread_add) are here, but the >> cv_wait_sig/cv_timedwait_sig never seems to return EINTR for these other= >>threads. >> >> if (ismaster || (!ismaster && >> 1207 grp->sg_threadcount > grp->sg_minthrea= ds)) >> 1208 error =3D cv_timedwait_sig(&st->st= _cond, >> 1209 &grp->sg_lock, 5 * hz); >> 1210 else >> 1211 error =3D cv_wait_sig(&st->st_cond= , >> 1212 &grp->sg_lock); >> >> The top thread (referred to in svc.c as "ismaster" did return from here = with EINTR >> and has now done an msleep() here, waiting for the other threads to term= inate. >> >> /* Waiting for threads to stop. */ >> 1387 for (g =3D 0; g < pool->sp_groupcount; g++) { >> 1388 grp =3D &pool->sp_groups[g]; >> 1389 mtx_lock(&grp->sg_lock); >> 1390 while (grp->sg_threadcount > 0) >> 1391 msleep(grp, &grp->sg_lock, 0, "svcexit", 0= ); >> 1392 mtx_unlock(&grp->sg_lock); >> 1393 } >> >> Although I can't be sure if this patch has fixed the problem because it = happens >> intermittently, I have not seen the problem since applying this patch: >> --- rpc/svc.c.sav 2018-06-21 22:52:11.623955000 -0400 >> +++ rpc/svc.c 2018-06-22 09:01:40.271803000 -0400 >> @@ -1388,7 +1388,7 @@ svc_run(SVCPOOL *pool) >> grp =3D &pool->sp_groups[g]; >> mtx_lock(&grp->sg_lock); >> while (grp->sg_threadcount > 0) >> - msleep(grp, &grp->sg_lock, 0, "svcexit", 0); >> + msleep(grp, &grp->sg_lock, 0, "svcexit", 1); >> mtx_unlock(&grp->sg_lock); >> } >> } >> >> As you can see, all it does is add a timeout to the msleep(). >> I am not familiar with the signal delivery code in sleepqeue, so it prob= ably >> isn't correct, but my theory is alonge the lines of... >> >> Since the msleep() doesn't have PCATCH, it does not set TDF_SINTR >> and if that happens before the other threads return EINTR from cv_wait_s= ig(), >> they no longer do so? >> And I thought that waking up from the msleep() via timeouts would maybe = allow >> the other threads to return EINTR from cv_wait_sig()? >> >> Does this make sense? rick >> ps: I'll post if I see the problem again with the patch applied. >> pss: This is a single core i386 system, just in case that might affect t= his. > >No, the patch does not make sense. I think it was just coincidental that >with the patch you did not get the hang. > >Signals are delivered to a thread, which should take the appropriate >actions. For the kernel process like rpc pool, the signals are never >delivered, they are queued in the randomly selected thread' signal queue >and sit there. The interruptible sleeps are aborted in the context >of that thread, but nothing else happens. So if you need to make svc >pools properly killable, all threads must check at least for EINTR and >instruct other threads to exit as well. I'm not sure I understand what the "randomly selected thread signal queue" = means, but it seems strange that this usually works. (The code is at least 10years= old. Originally committed by dfr@. I've added him to the cc list in case he unde= rstands this? Is it that, usually, the threads will all return EINTR before the master on= e gets to where the msleep() happens if the count is > 0? >Your description at the start of the message of the behaviour after >SIGKILL, where other threads continued to serve RPCs, exactly matches >above explanation. You need to add some global 'stop' flag, if it is not >yet present, and recheck it after each RPC handled. Any thread which >notes EINTR or does a direct check for the pending signal, should set >the flag and wake up every other thread in the pool. Ok, I'll code up a patch with a global "stop" flag and test it for a while. If it seems ok, I'll put it up in phabricator and ask you to review it. Thanks, rick