Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 02 Jun 2017 18:57:59 +0200
From:      Harry Schmalzbauer <freebsd@omnilan.de>
To:        "Kenneth D. Merry" <ken@FreeBSD.ORG>
Cc:        Stephen Mcconnell <stephen.mcconnell@broadcom.com>, freebsd-scsi@FreeBSD.ORG, Scott Long <scottl@FreeBSD.ORG>
Subject:   Re: mps(4) blocks panic-reboot
Message-ID:  <59319917.1050301@omnilan.de>
In-Reply-To: <593198C3.2080902@omnilan.de>
References:  <592FDE8C.1090609@omnilan.de> <ff9342e2e1eb541f347d9f683cfc8214@mail.gmail.com> <59303484.1040609@omnilan.de> <e6fe7cc17fb1302caf2122eaa11d10ba@mail.gmail.com> <59306503.4010007@omnilan.de> <59315A74.9050506@omnilan.de> <20170602153705.GA56018@mithlond.kdm.org> <593198C3.2080902@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------080306030606030400020901
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit

 Bezüglich Harry Schmalzbauer's Nachricht vom 02.06.2017 18:56 (localtime):
> Bezüglich Kenneth D. Merry's Nachricht vom 02.06.2017 17:37 (localtime):
>> On Fri, Jun 02, 2017 at 14:30:44 +0200, Harry Schmalzbauer wrote:
> …
>>> KDB: stack backtrace:
>>> #0 0xffffffff805df4f7 at kdb_backtrace+0x67
>>> #1 0xffffffff8059df96 at vpanic+0x186
>>> #2 0xffffffff8059de03 at panic+0x43
>>> #3 0xffffffff808a1892 at trap_fatal+0x322
>>> #4 0xffffffff808a18e9 at trap_pfault+0x49
>>> #5 0xffffffff808a1126 at trap+0x286
>>> #6 0xffffffff80887401 at calltrap+0x8
>>> #7 0xffffffff805800f2 at __mtx_unlock_sleep+0x72
>>> #8 0xffffffff8029a7dc at xpt_polled_action+0x31c
>>> #9 0xffffffff80416c2b at mpssas_ir_shutdown+0x51b
>>> #10 0xffffffff8059db9a at kern_reboot+0x49a
>>> #11 0xffffffff8059d6f8 at sys_reboot+0x458
>>> #12 0xffffffff808a23f4 at amd64_syscall+0x6c4
>>> #13 0xffffffff808876eb at Xfast_syscall+0xfb
>>>
>>> (kgdb) list *0xffffffff805f43ec                   
>>> 0xffffffff805f43ec is in turnstile_broadcast
>>> (/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_turnstile.c:837).
>>> 832
>>> 833             /*
>>> 834              * Transfer the blocked list to the pending list.
>>> 835              */
>>> 836             mtx_lock_spin(&td_contested_lock);
>>> 837             TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue],
>>> td_lockq);
>>> 838             mtx_unlock_spin(&td_contested_lock);
>>> 839
>>> 840             /*
>>> 841              * Give a turnstile to each thread.  The last thread gets
>>>
>>> I haven't looked at the code at all and only very briefly lokked at the
>>> diff, just out of curiosity, like pigs staring at clockworks ;-)
>>>
>>> But at least I hope this report does help.
>> Thanks for testing it!
>>
>> My guess is that the problem is that the problem is xpt_polled_action()
>> releases the device mutex, but mpssas_SSU_to_SATA_devices() isn't acquiring
>> the mutex.
>>
>> You could try putting the following around the call to xpt_polled_action():
>>
>> 	mtx_lock(xpt_path_mtx(ccb->ccb_h.path));
>> 	xpt_polled_action(ccb);
>> 	mtx_unlock(xpt_path_mtx(ccb->ccb_h.path));
>>
>> See if that fixes things.  One other thing to put in there -- after the
>> if (target->stop_at_shutdown) { } statement, but still inside the for
>> loop, add these two lines:
>>
>> 	xpt_free_path(ccb->ccb_h.path);
>> 	xpt_free_ccb(ccb);
>
> Jope I didn't mess up with text editing, pleas see the attached hunk if
> it corresponds to the (additional) chages to Stephen's diff.

Sorry, now really with attachment...


--------------080306030606030400020901
Content-Type: text/plain;
 name="mps_sas_lsi.c.kdmdiffpart"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="mps_sas_lsi.c.kdmdiffpart"

--- mps_sas_lsi.c.orig	2017-06-01 19:39:48.535697000 +0200
+++ mps_sas_lsi.c	2017-06-02 18:10:15.659582000 +0200
@@ -1175,26 +1172,12 @@
 			    /*immediate*/FALSE,
 			    MPS_SENSE_LEN,
 			    /*timeout*/10000);
-			xpt_action(ccb);
-		}
-	}
-
-	/*
-	 * Wait until all of the SSU commands have completed or time has
-	 * expired (60 seconds).  Pause for 100ms each time through.  If any
-	 * command times out, the target will be reset in the SCSI command
-	 * timeout routine.
-	 */
-	getmicrotime(&start_time);
-	while (sc->SSU_refcount) {
-		pause("mpswait", hz/10);
-		
-		getmicrotime(&cur_time);
-		if ((cur_time.tv_sec - start_time.tv_sec) > 60) {
-			mps_dprint(sc, MPS_FAULT, "Time has expired waiting "
-			    "for SSU commands to complete.\n");
-			break;
+			mtx_lock(xpt_path_mtx(ccb->ccb_h.path));
+			xpt_polled_action(ccb);
+			mtx_unlock(xpt_path_mtx(ccb->ccb_h.path));
 		}
+		xpt_free_path(ccb->ccb_h.path);
+		xpt_free_ccb(ccb);
 	}
 }
 

--------------080306030606030400020901--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?59319917.1050301>