From owner-freebsd-scsi@freebsd.org Fri Jun 2 16:58:02 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD9F3BFB54D for ; Fri, 2 Jun 2017 16:58:02 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5756A6AE77; Fri, 2 Jun 2017 16:58:02 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v52GvxqR098743; Fri, 2 Jun 2017 18:57:59 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 6EB8925E; Fri, 2 Jun 2017 18:57:59 +0200 (CEST) Message-ID: <59319917.1050301@omnilan.de> Date: Fri, 02 Jun 2017 18:57:59 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: "Kenneth D. Merry" CC: Stephen Mcconnell , freebsd-scsi@FreeBSD.ORG, Scott Long Subject: Re: mps(4) blocks panic-reboot References: <592FDE8C.1090609@omnilan.de> <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> <59315A74.9050506@omnilan.de> <20170602153705.GA56018@mithlond.kdm.org> <593198C3.2080902@omnilan.de> In-Reply-To: <593198C3.2080902@omnilan.de> Content-Type: multipart/mixed; boundary="------------080306030606030400020901" X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Fri, 02 Jun 2017 18:58:00 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 16:58:02 -0000 This is a multi-part message in MIME format. --------------080306030606030400020901 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Bezüglich Harry Schmalzbauer's Nachricht vom 02.06.2017 18:56 (localtime): > Bezüglich Kenneth D. Merry's Nachricht vom 02.06.2017 17:37 (localtime): >> On Fri, Jun 02, 2017 at 14:30:44 +0200, Harry Schmalzbauer wrote: > … >>> KDB: stack backtrace: >>> #0 0xffffffff805df4f7 at kdb_backtrace+0x67 >>> #1 0xffffffff8059df96 at vpanic+0x186 >>> #2 0xffffffff8059de03 at panic+0x43 >>> #3 0xffffffff808a1892 at trap_fatal+0x322 >>> #4 0xffffffff808a18e9 at trap_pfault+0x49 >>> #5 0xffffffff808a1126 at trap+0x286 >>> #6 0xffffffff80887401 at calltrap+0x8 >>> #7 0xffffffff805800f2 at __mtx_unlock_sleep+0x72 >>> #8 0xffffffff8029a7dc at xpt_polled_action+0x31c >>> #9 0xffffffff80416c2b at mpssas_ir_shutdown+0x51b >>> #10 0xffffffff8059db9a at kern_reboot+0x49a >>> #11 0xffffffff8059d6f8 at sys_reboot+0x458 >>> #12 0xffffffff808a23f4 at amd64_syscall+0x6c4 >>> #13 0xffffffff808876eb at Xfast_syscall+0xfb >>> >>> (kgdb) list *0xffffffff805f43ec >>> 0xffffffff805f43ec is in turnstile_broadcast >>> (/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_turnstile.c:837). >>> 832 >>> 833 /* >>> 834 * Transfer the blocked list to the pending list. >>> 835 */ >>> 836 mtx_lock_spin(&td_contested_lock); >>> 837 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], >>> td_lockq); >>> 838 mtx_unlock_spin(&td_contested_lock); >>> 839 >>> 840 /* >>> 841 * Give a turnstile to each thread. The last thread gets >>> >>> I haven't looked at the code at all and only very briefly lokked at the >>> diff, just out of curiosity, like pigs staring at clockworks ;-) >>> >>> But at least I hope this report does help. >> Thanks for testing it! >> >> My guess is that the problem is that the problem is xpt_polled_action() >> releases the device mutex, but mpssas_SSU_to_SATA_devices() isn't acquiring >> the mutex. >> >> You could try putting the following around the call to xpt_polled_action(): >> >> mtx_lock(xpt_path_mtx(ccb->ccb_h.path)); >> xpt_polled_action(ccb); >> mtx_unlock(xpt_path_mtx(ccb->ccb_h.path)); >> >> See if that fixes things. One other thing to put in there -- after the >> if (target->stop_at_shutdown) { } statement, but still inside the for >> loop, add these two lines: >> >> xpt_free_path(ccb->ccb_h.path); >> xpt_free_ccb(ccb); > > Jope I didn't mess up with text editing, pleas see the attached hunk if > it corresponds to the (additional) chages to Stephen's diff. Sorry, now really with attachment... --------------080306030606030400020901 Content-Type: text/plain; name="mps_sas_lsi.c.kdmdiffpart" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mps_sas_lsi.c.kdmdiffpart" --- mps_sas_lsi.c.orig 2017-06-01 19:39:48.535697000 +0200 +++ mps_sas_lsi.c 2017-06-02 18:10:15.659582000 +0200 @@ -1175,26 +1172,12 @@ /*immediate*/FALSE, MPS_SENSE_LEN, /*timeout*/10000); - xpt_action(ccb); - } - } - - /* - * Wait until all of the SSU commands have completed or time has - * expired (60 seconds). Pause for 100ms each time through. If any - * command times out, the target will be reset in the SCSI command - * timeout routine. - */ - getmicrotime(&start_time); - while (sc->SSU_refcount) { - pause("mpswait", hz/10); - - getmicrotime(&cur_time); - if ((cur_time.tv_sec - start_time.tv_sec) > 60) { - mps_dprint(sc, MPS_FAULT, "Time has expired waiting " - "for SSU commands to complete.\n"); - break; + mtx_lock(xpt_path_mtx(ccb->ccb_h.path)); + xpt_polled_action(ccb); + mtx_unlock(xpt_path_mtx(ccb->ccb_h.path)); } + xpt_free_path(ccb->ccb_h.path); + xpt_free_ccb(ccb); } } --------------080306030606030400020901--