From owner-freebsd-scsi@freebsd.org Fri Jun 2 16:56:45 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7FF21BFB4E9 for ; Fri, 2 Jun 2017 16:56:45 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1AD6E6AE12; Fri, 2 Jun 2017 16:56:38 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v52GuapL098733; Fri, 2 Jun 2017 18:56:36 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id A608C25B; Fri, 2 Jun 2017 18:56:35 +0200 (CEST) Message-ID: <593198C3.2080902@omnilan.de> Date: Fri, 02 Jun 2017 18:56:35 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: "Kenneth D. Merry" CC: Stephen Mcconnell , freebsd-scsi@FreeBSD.ORG, Scott Long Subject: Re: mps(4) blocks panic-reboot References: <592FDE8C.1090609@omnilan.de> <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> <59315A74.9050506@omnilan.de> <20170602153705.GA56018@mithlond.kdm.org> In-Reply-To: <20170602153705.GA56018@mithlond.kdm.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Fri, 02 Jun 2017 18:56:36 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 16:56:45 -0000 Bezüglich Kenneth D. Merry's Nachricht vom 02.06.2017 17:37 (localtime): > On Fri, Jun 02, 2017 at 14:30:44 +0200, Harry Schmalzbauer wrote: … >> KDB: stack backtrace: >> #0 0xffffffff805df4f7 at kdb_backtrace+0x67 >> #1 0xffffffff8059df96 at vpanic+0x186 >> #2 0xffffffff8059de03 at panic+0x43 >> #3 0xffffffff808a1892 at trap_fatal+0x322 >> #4 0xffffffff808a18e9 at trap_pfault+0x49 >> #5 0xffffffff808a1126 at trap+0x286 >> #6 0xffffffff80887401 at calltrap+0x8 >> #7 0xffffffff805800f2 at __mtx_unlock_sleep+0x72 >> #8 0xffffffff8029a7dc at xpt_polled_action+0x31c >> #9 0xffffffff80416c2b at mpssas_ir_shutdown+0x51b >> #10 0xffffffff8059db9a at kern_reboot+0x49a >> #11 0xffffffff8059d6f8 at sys_reboot+0x458 >> #12 0xffffffff808a23f4 at amd64_syscall+0x6c4 >> #13 0xffffffff808876eb at Xfast_syscall+0xfb >> >> (kgdb) list *0xffffffff805f43ec >> 0xffffffff805f43ec is in turnstile_broadcast >> (/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_turnstile.c:837). >> 832 >> 833 /* >> 834 * Transfer the blocked list to the pending list. >> 835 */ >> 836 mtx_lock_spin(&td_contested_lock); >> 837 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], >> td_lockq); >> 838 mtx_unlock_spin(&td_contested_lock); >> 839 >> 840 /* >> 841 * Give a turnstile to each thread. The last thread gets >> >> I haven't looked at the code at all and only very briefly lokked at the >> diff, just out of curiosity, like pigs staring at clockworks ;-) >> >> But at least I hope this report does help. > > Thanks for testing it! > > My guess is that the problem is that the problem is xpt_polled_action() > releases the device mutex, but mpssas_SSU_to_SATA_devices() isn't acquiring > the mutex. > > You could try putting the following around the call to xpt_polled_action(): > > mtx_lock(xpt_path_mtx(ccb->ccb_h.path)); > xpt_polled_action(ccb); > mtx_unlock(xpt_path_mtx(ccb->ccb_h.path)); > > See if that fixes things. One other thing to put in there -- after the > if (target->stop_at_shutdown) { } statement, but still inside the for > loop, add these two lines: > > xpt_free_path(ccb->ccb_h.path); > xpt_free_ccb(ccb); Jope I didn't mess up with text editing, pleas see the attached hunk if it corresponds to the (additional) chages to Stephen's diff. This leads to a series of panics?!? (was very quick after the dump of the first panic was written) ums1: detached mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 mps0: Completing stop unit for (xpt0:mps0:0:2:ffffffff): Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x478 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80416cca stack pointer = 0x28:0xfffffe03bc9c37f0 frame pointer = 0x28:0xfffffe03bc9c3880 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1 (init) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff805df5c7 at kdb_backtrace+0x67 #1 0xffffffff8059e066 at vpanic+0x186 #2 0xffffffff8059ded3 at panic+0x43 #3 0xffffffff808a1962 at trap_fatal+0x322 #4 0xffffffff808a19b9 at trap_pfault+0x49 #5 0xffffffff808a11f6 at trap+0x286 #6 0xffffffff808874d1 at calltrap+0x8 #7 0xffffffff8059dc6a at kern_reboot+0x49a #8 0xffffffff8059d7c8 at sys_reboot+0x458 #9 0xffffffff808a24c4 at amd64_syscall+0x6c4 #10 0xffffffff808877bb at Xfast_syscall+0xfb Uptime: 1m15s (da0:mps0:0:2:0): Synchronize cache failed Dumping 1277 out of 15734 … #0 doadump (textdump=) at pcpu.h:222 222 pcpu.h: No such file or directory. in pcpu.h (kgdb) list *0xffffffff80416cca 0xffffffff80416cca is in mpssas_ir_shutdown (atomic.h:188). 183 atomic.h: No such file or directory. in atomic.h Should I reduce compiler optimization? Thanks, -harry