From owner-freebsd-scsi@freebsd.org Tue May 30 17:06:14 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BE176B8880E for ; Tue, 30 May 2017 17:06:14 +0000 (UTC) (envelope-from dustinwenz@ebureau.com) Received: from internet06.ebureau.com (internet06.ebureau.com [65.127.24.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "internet06.ebureau.com", Issuer "internet06.ebureau.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 9CF7A6FBDA for ; Tue, 30 May 2017 17:06:14 +0000 (UTC) (envelope-from dustinwenz@ebureau.com) Received: from localhost (localhost [127.0.0.1]) by internet06.ebureau.com (Postfix) with ESMTP id 662E27913694 for ; Tue, 30 May 2017 12:00:04 -0500 (CDT) X-Virus-Scanned: amavisd-new at mydomain = ebureau.com Received: from internet06.ebureau.com ([127.0.0.1]) by localhost (internet06.ebureau.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LpVeaXIGOqhl for ; Tue, 30 May 2017 12:00:04 -0500 (CDT) Received: from square.office.ebureau.com (unknown [10.10.20.22]) by internet06.ebureau.com (Postfix) with ESMTPSA id 2C315791368A for ; Tue, 30 May 2017 12:00:04 -0500 (CDT) From: Dustin Wenz Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Inferring SAS expander topology Message-Id: <18E31C87-AB53-491F-9E40-F496AE31E305@ebureau.com> Date: Tue, 30 May 2017 12:00:03 -0500 To: freebsd-scsi@freebsd.org X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 May 2017 17:06:14 -0000 I'm working on a server that has 6 LSI SAS expanders connected to it. = Two of them are throwing intermittent command errors and need some = attention. Because the host only has four external 8088 ports, I know = some of the expanders are chained through each other. Before I can send parts for replacement, I need to determine if the two = problematic expanders are either daisy-chained or directly connected to = the host. On Linux, this is trivial; I would just browse the device = topology in /sys/. However, this machine is running FreeBSD 10.3, and I = am unable to find a way to do this. I have sg3_utils available, as well = as any build-in FreeBSD tools. Using this software, is it possible to = determine if a SAS expander is chained off of another? - .Dustin From owner-freebsd-scsi@freebsd.org Tue May 30 17:31:35 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F3940B94682 for ; Tue, 30 May 2017 17:31:34 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-yw0-x22f.google.com (mail-yw0-x22f.google.com [IPv6:2607:f8b0:4002:c05::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A3EA07121E for ; Tue, 30 May 2017 17:31:34 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-yw0-x22f.google.com with SMTP id l14so43268439ywk.1 for ; Tue, 30 May 2017 10:31:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=46QHQby/uSLGS83XzIsVuH50jd0pXVRCrF/kbrbOSsM=; b=nNCaN89pSJdw7Q684cMQMcPyFJ54ARkFESBUNiUVvuFRYnjP0+MMTpZXLsB6usRoms he6BbYd/7KsIp5wIz/vUd/Nkyr8FUME7SAGPTZpp6JwmPwd1xjd+bi/4lJi1rUkxagB2 Zlft3k8SHZ4b6kHI6AuJ50VicvKALhxKrOC9IqQ3AZaZSEC6ylNiDc8gkjMsyEawqQkG cbm2xHIBVOM+XZXyNdSd9JkW2JxrQep6c5+013CH+y9GAghBwK6VqARarT9/DRTW88a4 gm+HRJnJnTnZZ8K3DuOvqzZ3gUyLJk3RDr7Rg9dQTasJ8hD4OTBDyQGOzXKemMAqyqBl 7dFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc:content-transfer-encoding; bh=46QHQby/uSLGS83XzIsVuH50jd0pXVRCrF/kbrbOSsM=; b=U6JrvtoYYLzTNv5atZ2KKb3TN8sOkmQ/N8Vn8ysn55gYd3kjcZmRvZJN8oOsdmIvc5 7X/aKFB/+KGNJMhDiQw2Dycoj1LZVrodQObhC6sUb/FFkTZViUvHz3oRNrMjqrZULcpO pEw4dclaKn4Ci0/sjXUL2kVdNhZMZBjjoM9PjAmdtEi+RcAltEpBlScvMmdHVEE2aLYh kJWEUJjGatu9GJDRkm69uMMzwhHXY6PF6pAP1XM1+LxYrvpobxrRptrGqZGiUmEz2LBS UyiialCrveqzikwZpj+gin/VpJmaJOni0kBg+2wt6ovIiRmLVCvMRoUvqa9D2xUGR37c Kzmg== X-Gm-Message-State: AODbwcC5wsD6uu9I+d/ckmidFa6WZw7CizmRhtWAitrYIwJ762WN1nGR LbN6t0lTGbi3dcbp6dL0vCC26EHdVA== X-Received: by 10.129.89.85 with SMTP id n82mr16616828ywb.94.1496165493615; Tue, 30 May 2017 10:31:33 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.13.206.199 with HTTP; Tue, 30 May 2017 10:31:33 -0700 (PDT) In-Reply-To: <18E31C87-AB53-491F-9E40-F496AE31E305@ebureau.com> References: <18E31C87-AB53-491F-9E40-F496AE31E305@ebureau.com> From: Alan Somers Date: Tue, 30 May 2017 11:31:33 -0600 X-Google-Sender-Auth: 4a_mFTvjPUPL_n9wzrKRm6k4eZw Message-ID: Subject: Re: Inferring SAS expander topology To: Dustin Wenz Cc: FreeBSD-scsi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 May 2017 17:31:35 -0000 On Tue, May 30, 2017 at 11:00 AM, Dustin Wenz wrot= e: > I'm working on a server that has 6 LSI SAS expanders connected to it. Two= of them are throwing intermittent command errors and need some attention. = Because the host only has four external 8088 ports, I know some of the expa= nders are chained through each other. > > Before I can send parts for replacement, I need to determine if the two p= roblematic expanders are either daisy-chained or directly connected to the = host. On Linux, this is trivial; I would just browse the device topology in= /sys/. However, this machine is running FreeBSD 10.3, and I am unable to f= ind a way to do this. I have sg3_utils available, as well as any build-in F= reeBSD tools. Using this software, is it possible to determine if a SAS exp= ander is chained off of another? > > - .Dustin I don't think sg3_utils will help you. You want sysutils/smp_utils instead. If you install that and then run "smp_discover /dev/ses0", it will show you what each phy is connected to. Usually, there will be four phys connected to the upstream port. You can tell by their EUI64s whether they're connected directly to the HBA or to another expander. And you can tell which expander by comparing the exact EUI64 to each other expander's SEP phy. Note that with LSI expanders, the SEP's address usually differs from the expander's address by a few bits in the last byte. For example, if the expander's address is 0x50000000000000ff, then the SEP's address might by 0x50000000000000ffd. Here's some example output from one of my systems: # smp_discover /dev/ses3 phy 0:U:attached:[500093d23000a000:00 t(SATA)] 6 Gbps phy 1:U:attached:[500093d23000a001:00 t(SATA)] 6 Gbps phy 2:U:attached:[500093d23000a002:00 t(SATA)] 6 Gbps phy 3:U:attached:[500093d23000a003:00 t(SATA)] 6 Gbps phy 4:U:attached:[500093d23000a17f:11 exp t(SMP)] 6 Gbps <- Connected to the other expander phy 5:U:attached:[500093d23000a17f:10 exp t(SMP)] 6 Gbps <- Connected to the other expander phy 6:U:attached:[500093d23000a17f:09 exp t(SMP)] 6 Gbps <- Connected to the other expander phy 7:U:attached:[500093d23000a17f:08 exp t(SMP)] 6 Gbps <- Connected to the other expander phy 8:U:attached:[500093d23000a1bf:11 exp t(SMP)] 6 Gbps <- Connected to a third expander phy 9:U:attached:[500093d23000a1bf:10 exp t(SMP)] 6 Gbps <- Connected to a third expander phy 10:U:attached:[500093d23000a1bf:09 exp t(SMP)] 6 Gbps <- Connected to a third expander phy 11:U:attached:[500093d23000a1bf:08 exp t(SMP)] 6 Gbps <- Connected to a third expander phy 16:U:attached:[500093d23000a010:00 t(SATA)] 6 Gbps phy 17:U:attached:[500093d23000a011:00 t(SATA)] 6 Gbps phy 18:U:attached:[500093d23000a012:00 t(SATA)] 6 Gbps phy 19:U:attached:[500093d23000a013:00 t(SATA)] 6 Gbps phy 20:U:attached:[500093d23000a014:00 t(SATA)] 6 Gbps phy 21:U:attached:[500093d23000a015:00 t(SATA)] 6 Gbps phy 22:U:attached:[500093d23000a016:00 t(SATA)] 6 Gbps phy 23:U:attached:[500093d23000a017:00 t(SATA)] 6 Gbps phy 25:U:attached:[500093d23000a019:00 t(SATA)] 6 Gbps phy 26:U:attached:[500093d23000a01a:00 t(SATA)] 6 Gbps phy 29:U:attached:[500093d23000a01d:00 t(SATA)] 6 Gbps phy 30:U:attached:[500093d23000a01e:00 t(SATA)] 6 Gbps phy 31:U:attached:[500093d23000a01f:00 t(SATA)] 6 Gbps phy 33:U:attached:[500093d23000a021:00 t(SATA)] 6 Gbps phy 34:U:attached:[500093d23000a022:00 t(SATA)] 6 Gbps phy 35:U:attached:[500093d23000a023:00 t(SATA)] 6 Gbps phy 36:U:attached:[500093d23000a024:00 t(SATA)] 6 Gbps phy 37:U:attached:[500093d23000a025:00 t(SATA)] 6 Gbps phy 38:U:attached:[500093d23000a026:00 t(SATA)] 6 Gbps phy 39:U:attached:[500093d23000a027:00 t(SATA)] 6 Gbps phy 40:U:attached:[500605b008a93990:04 i(SSP+STP+SMP)] 6 Gbps <- Connected to an LSI HBA phy 41:U:attached:[500605b008a93990:07 i(SSP+STP+SMP)] 6 Gbps <- Connected to an LSI HBA phy 42:U:attached:[500605b008a93990:05 i(SSP+STP+SMP)] 6 Gbps <- Connected to an LSI HBA phy 43:U:attached:[500605b008a93990:06 i(SSP+STP+SMP)] 6 Gbps <- Connected to an LSI HBA phy 48:D:attached:[500093d23000a03d:00 V i(SMP) t(SSP)] 12 Gbps <- This expander's SEP # smp_discover /dev/pass121 phy 0:U:attached:[500093d23000a140:00 t(SATA)] 6 Gbps phy 1:U:attached:[500093d23000a141:00 t(SATA)] 6 Gbps phy 2:U:attached:[500093d23000a142:00 t(SATA)] 6 Gbps phy 3:U:attached:[500093d23000a143:00 t(SATA)] 6 Gbps phy 4:U:attached:[500093d23000a144:00 t(SATA)] 6 Gbps phy 5:U:attached:[500093d23000a145:00 t(SATA)] 6 Gbps phy 6:U:attached:[500093d23000a146:00 t(SATA)] 6 Gbps phy 7:U:attached:[500093d23000a147:00 t(SATA)] 6 Gbps phy 8:U:attached:[500093d23000a03f:07 exp t(SMP)] 6 Gbps <- Connected to the other expander phy 9:U:attached:[500093d23000a03f:06 exp t(SMP)] 6 Gbps <- Connected to the other expander phy 10:U:attached:[500093d23000a03f:05 exp t(SMP)] 6 Gbps <- Connected to the other expander phy 11:U:attached:[500093d23000a03f:04 exp t(SMP)] 6 Gbps <- Connected to the other expander phy 12:U:attached:[500093d23000a14c:00 t(SATA)] 6 Gbps phy 14:U:attached:[500093d23000a14e:00 t(SATA)] 6 Gbps phy 15:U:attached:[500093d23000a14f:00 t(SATA)] 6 Gbps phy 16:U:attached:[500093d23000a150:00 t(SATA)] 6 Gbps phy 17:U:attached:[500093d23000a151:00 t(SATA)] 6 Gbps phy 18:U:attached:[500093d23000a152:00 t(SATA)] 6 Gbps phy 19:U:attached:[500093d23000a153:00 t(SATA)] 6 Gbps phy 20:U:attached:[500093d23000a154:00 t(SATA)] 6 Gbps phy 21:U:attached:[500093d23000a155:00 t(SATA)] 6 Gbps phy 22:U:attached:[500093d23000a156:00 t(SATA)] 6 Gbps phy 23:U:attached:[500093d23000a157:00 t(SATA)] 6 Gbps phy 24:U:attached:[500093d23000a158:00 t(SATA)] 6 Gbps phy 25:U:attached:[500093d23000a159:00 t(SATA)] 6 Gbps phy 26:U:attached:[500093d23000a15a:00 t(SATA)] 6 Gbps phy 27:U:attached:[500093d23000a15b:00 t(SATA)] 6 Gbps phy 28:U:attached:[500093d23000a15c:00 t(SATA)] 6 Gbps phy 29:U:attached:[500093d23000a15d:00 t(SATA)] 6 Gbps phy 30:U:attached:[500093d23000a15e:00 t(SATA)] 6 Gbps phy 32:U:attached:[500093d23000a160:00 t(SATA)] 6 Gbps phy 33:U:attached:[500093d23000a161:00 t(SATA)] 6 Gbps phy 34:U:attached:[500093d23000a162:00 t(SATA)] 6 Gbps phy 35:U:attached:[500093d23000a163:00 t(SATA)] 6 Gbps phy 36:U:attached:[500093d23000a164:00 t(SATA)] 6 Gbps phy 37:U:attached:[500093d23000a165:00 t(SATA)] 6 Gbps phy 38:U:attached:[500093d23000a166:00 t(SATA)] 6 Gbps phy 39:U:attached:[500093d23000a167:00 t(SATA)] 6 Gbps phy 40:D:attached:[500093d23000a17d:00 V i(SMP) t(SSP)] 12 Gbps <- This expander's SEP -Alan From owner-freebsd-scsi@freebsd.org Tue May 30 19:53:33 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2B898BDAE50 for ; Tue, 30 May 2017 19:53:33 +0000 (UTC) (envelope-from scottl@netflix.com) Received: from mail-io0-x229.google.com (mail-io0-x229.google.com [IPv6:2607:f8b0:4001:c06::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EB29E76F28 for ; Tue, 30 May 2017 19:53:32 +0000 (UTC) (envelope-from scottl@netflix.com) Received: by mail-io0-x229.google.com with SMTP id p24so63279067ioi.0 for ; Tue, 30 May 2017 12:53:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netflix.com; s=google; h=from:mime-version:subject:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=GmPj3DVBGO0vOwvg8QMUgkEPjGVpuR/x5BoNh5cN1NM=; b=Xuv4XQbk+9uOGlx7w7G6upCgOhM4SydHkcKp3zn4hImHVVYnxFbtWOHXQAGPqdfLcA 4eA7ze/1o6Eb30dMTZ8T51RIzQlnaz25Mxr/dGWcEQPhpav8JsS4gUHSWLDSPHDm2+6A Xhe99PsQ0BNR1DSAL+5slu83ljrUzOfg9+9oA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:subject:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=GmPj3DVBGO0vOwvg8QMUgkEPjGVpuR/x5BoNh5cN1NM=; b=ofAjRl3P4fTnW4V84x74iVeEUbUqd+8X2qVA24lwekxpPR0IQ9TOSWwKrgmtY2jSkP IcJDLKtupVYwiF/CZb3C2rp9HBRg3A9qJNvRHxHeQ8eq9HNeIQQLAvKWRuqCiVEDo08s zeJUHDscrNK9ueT9l6IRrBhdYPqqCYrfYPux4lzPil21HYz5RJfAq3KJ4gdLBm6V0500 moRn6Bkxz3jSUoEY4MpetMXN2mTLo197JYsJ43bN9RCZbI3j/D/aHRH4ZlvG4KLU7jJf rNA2slxXaT/zZFsPb+38GBYZyv3lc/DpHjyYTBBSOrQQZKDhRwqD+yfyNGOBRA7ii3Ii oJlw== X-Gm-Message-State: AODbwcA23vA98w7QCr27MEGs1zzNpBfTf0AbjDZJGRlAe9oqMO5RECyX 4BH6x0/RnZK5LH9V X-Received: by 10.107.20.13 with SMTP id 13mr20796167iou.185.1496174011035; Tue, 30 May 2017 12:53:31 -0700 (PDT) Received: from [192.168.0.106] ([161.97.210.12]) by smtp.gmail.com with ESMTPSA id v125sm9956119ita.13.2017.05.30.12.53.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 May 2017 12:53:30 -0700 (PDT) From: Scott Long X-Google-Original-From: Scott Long Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Inferring SAS expander topology In-Reply-To: <18E31C87-AB53-491F-9E40-F496AE31E305@ebureau.com> Date: Tue, 30 May 2017 13:53:29 -0600 Cc: freebsd-scsi@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <48B810AA-95E9-4ADA-8D18-849362C767AA@yahoo.com> References: <18E31C87-AB53-491F-9E40-F496AE31E305@ebureau.com> To: Dustin Wenz X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 May 2017 19:53:33 -0000 Hi Dustin, FreeBSD relies on the LSI firmware to manage topology, and has no = awareness of it on these controllers. You can send SMP commands = directly via the camcontrol utility and perform topology management and = discovery manually. I=E2=80=99m not sure if sg3_utils knows how to = communicate with this, though. Scott > On May 30, 2017, at 11:00 AM, Dustin Wenz = wrote: >=20 > I'm working on a server that has 6 LSI SAS expanders connected to it. = Two of them are throwing intermittent command errors and need some = attention. Because the host only has four external 8088 ports, I know = some of the expanders are chained through each other. >=20 > Before I can send parts for replacement, I need to determine if the = two problematic expanders are either daisy-chained or directly connected = to the host. On Linux, this is trivial; I would just browse the device = topology in /sys/. However, this machine is running FreeBSD 10.3, and I = am unable to find a way to do this. I have sg3_utils available, as well = as any build-in FreeBSD tools. Using this software, is it possible to = determine if a SAS expander is chained off of another? >=20 > - .Dustin >=20 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Tue May 30 20:09:19 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8F27BEB3CB for ; Tue, 30 May 2017 20:09:19 +0000 (UTC) (envelope-from dustinwenz@ebureau.com) Received: from internet06.ebureau.com (internet06.ebureau.com [65.127.24.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "internet06.ebureau.com", Issuer "internet06.ebureau.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 8E79577692; Tue, 30 May 2017 20:09:19 +0000 (UTC) (envelope-from dustinwenz@ebureau.com) Received: from localhost (localhost [127.0.0.1]) by internet06.ebureau.com (Postfix) with ESMTP id 847B179188F8; Tue, 30 May 2017 15:09:17 -0500 (CDT) X-Virus-Scanned: amavisd-new at mydomain = ebureau.com Received: from internet06.ebureau.com ([127.0.0.1]) by localhost (internet06.ebureau.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IVVt68QwbqZU; Tue, 30 May 2017 15:09:16 -0500 (CDT) Received: from square.office.ebureau.com (unknown [10.10.20.22]) by internet06.ebureau.com (Postfix) with ESMTPSA id ABF5779188EE; Tue, 30 May 2017 15:09:16 -0500 (CDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: Inferring SAS expander topology From: Dustin Wenz In-Reply-To: Date: Tue, 30 May 2017 15:09:15 -0500 Cc: FreeBSD-scsi Content-Transfer-Encoding: quoted-printable Message-Id: <5124123C-B996-4DC2-A338-64ACC33A5C09@ebureau.com> References: <18E31C87-AB53-491F-9E40-F496AE31E305@ebureau.com> To: Alan Somers X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 May 2017 20:09:19 -0000 Thanks! It looks like smp_discover will do what I need, though is not = particularly user-friendly for this specific application.=20 I needed to use both "smp_discover sesX" in addition to "smp_discover = --phy=3DY --brief sesX" in order to correlate SAS interfaces with the = addresses of devices attached to them. - .Dustin > On May 30, 2017, at 12:31 PM, Alan Somers wrote: >=20 > On Tue, May 30, 2017 at 11:00 AM, Dustin Wenz = wrote: >> I'm working on a server that has 6 LSI SAS expanders connected to it. = Two of them are throwing intermittent command errors and need some = attention. Because the host only has four external 8088 ports, I know = some of the expanders are chained through each other. >>=20 >> Before I can send parts for replacement, I need to determine if the = two problematic expanders are either daisy-chained or directly connected = to the host. On Linux, this is trivial; I would just browse the device = topology in /sys/. However, this machine is running FreeBSD 10.3, and I = am unable to find a way to do this. I have sg3_utils available, as well = as any build-in FreeBSD tools. Using this software, is it possible to = determine if a SAS expander is chained off of another? >>=20 >> - .Dustin >=20 > I don't think sg3_utils will help you. You want sysutils/smp_utils > instead. If you install that and then run "smp_discover /dev/ses0", > it will show you what each phy is connected to. Usually, there will > be four phys connected to the upstream port. You can tell by their > EUI64s whether they're connected directly to the HBA or to another > expander. And you can tell which expander by comparing the exact > EUI64 to each other expander's SEP phy. Note that with LSI expanders, > the SEP's address usually differs from the expander's address by a few > bits in the last byte. For example, if the expander's address is > 0x50000000000000ff, then the SEP's address might by > 0x50000000000000ffd. Here's some example output from one of my > systems: >=20 > # smp_discover /dev/ses3 > phy 0:U:attached:[500093d23000a000:00 t(SATA)] 6 Gbps > phy 1:U:attached:[500093d23000a001:00 t(SATA)] 6 Gbps > phy 2:U:attached:[500093d23000a002:00 t(SATA)] 6 Gbps > phy 3:U:attached:[500093d23000a003:00 t(SATA)] 6 Gbps > phy 4:U:attached:[500093d23000a17f:11 exp t(SMP)] 6 Gbps > <- Connected to the other expander > phy 5:U:attached:[500093d23000a17f:10 exp t(SMP)] 6 Gbps > <- Connected to the other expander > phy 6:U:attached:[500093d23000a17f:09 exp t(SMP)] 6 Gbps > <- Connected to the other expander > phy 7:U:attached:[500093d23000a17f:08 exp t(SMP)] 6 Gbps > <- Connected to the other expander > phy 8:U:attached:[500093d23000a1bf:11 exp t(SMP)] 6 Gbps > <- Connected to a third expander > phy 9:U:attached:[500093d23000a1bf:10 exp t(SMP)] 6 Gbps > <- Connected to a third expander > phy 10:U:attached:[500093d23000a1bf:09 exp t(SMP)] 6 Gbps > <- Connected to a third expander > phy 11:U:attached:[500093d23000a1bf:08 exp t(SMP)] 6 Gbps > <- Connected to a third expander > phy 16:U:attached:[500093d23000a010:00 t(SATA)] 6 Gbps > phy 17:U:attached:[500093d23000a011:00 t(SATA)] 6 Gbps > phy 18:U:attached:[500093d23000a012:00 t(SATA)] 6 Gbps > phy 19:U:attached:[500093d23000a013:00 t(SATA)] 6 Gbps > phy 20:U:attached:[500093d23000a014:00 t(SATA)] 6 Gbps > phy 21:U:attached:[500093d23000a015:00 t(SATA)] 6 Gbps > phy 22:U:attached:[500093d23000a016:00 t(SATA)] 6 Gbps > phy 23:U:attached:[500093d23000a017:00 t(SATA)] 6 Gbps > phy 25:U:attached:[500093d23000a019:00 t(SATA)] 6 Gbps > phy 26:U:attached:[500093d23000a01a:00 t(SATA)] 6 Gbps > phy 29:U:attached:[500093d23000a01d:00 t(SATA)] 6 Gbps > phy 30:U:attached:[500093d23000a01e:00 t(SATA)] 6 Gbps > phy 31:U:attached:[500093d23000a01f:00 t(SATA)] 6 Gbps > phy 33:U:attached:[500093d23000a021:00 t(SATA)] 6 Gbps > phy 34:U:attached:[500093d23000a022:00 t(SATA)] 6 Gbps > phy 35:U:attached:[500093d23000a023:00 t(SATA)] 6 Gbps > phy 36:U:attached:[500093d23000a024:00 t(SATA)] 6 Gbps > phy 37:U:attached:[500093d23000a025:00 t(SATA)] 6 Gbps > phy 38:U:attached:[500093d23000a026:00 t(SATA)] 6 Gbps > phy 39:U:attached:[500093d23000a027:00 t(SATA)] 6 Gbps > phy 40:U:attached:[500605b008a93990:04 i(SSP+STP+SMP)] 6 Gbps > <- Connected to an LSI HBA > phy 41:U:attached:[500605b008a93990:07 i(SSP+STP+SMP)] 6 Gbps > <- Connected to an LSI HBA > phy 42:U:attached:[500605b008a93990:05 i(SSP+STP+SMP)] 6 Gbps > <- Connected to an LSI HBA > phy 43:U:attached:[500605b008a93990:06 i(SSP+STP+SMP)] 6 Gbps > <- Connected to an LSI HBA > phy 48:D:attached:[500093d23000a03d:00 V i(SMP) t(SSP)] 12 Gbps > <- This expander's SEP >=20 > # smp_discover /dev/pass121 > phy 0:U:attached:[500093d23000a140:00 t(SATA)] 6 Gbps > phy 1:U:attached:[500093d23000a141:00 t(SATA)] 6 Gbps > phy 2:U:attached:[500093d23000a142:00 t(SATA)] 6 Gbps > phy 3:U:attached:[500093d23000a143:00 t(SATA)] 6 Gbps > phy 4:U:attached:[500093d23000a144:00 t(SATA)] 6 Gbps > phy 5:U:attached:[500093d23000a145:00 t(SATA)] 6 Gbps > phy 6:U:attached:[500093d23000a146:00 t(SATA)] 6 Gbps > phy 7:U:attached:[500093d23000a147:00 t(SATA)] 6 Gbps > phy 8:U:attached:[500093d23000a03f:07 exp t(SMP)] 6 Gbps > <- Connected to the other expander > phy 9:U:attached:[500093d23000a03f:06 exp t(SMP)] 6 Gbps > <- Connected to the other expander > phy 10:U:attached:[500093d23000a03f:05 exp t(SMP)] 6 Gbps > <- Connected to the other expander > phy 11:U:attached:[500093d23000a03f:04 exp t(SMP)] 6 Gbps > <- Connected to the other expander > phy 12:U:attached:[500093d23000a14c:00 t(SATA)] 6 Gbps > phy 14:U:attached:[500093d23000a14e:00 t(SATA)] 6 Gbps > phy 15:U:attached:[500093d23000a14f:00 t(SATA)] 6 Gbps > phy 16:U:attached:[500093d23000a150:00 t(SATA)] 6 Gbps > phy 17:U:attached:[500093d23000a151:00 t(SATA)] 6 Gbps > phy 18:U:attached:[500093d23000a152:00 t(SATA)] 6 Gbps > phy 19:U:attached:[500093d23000a153:00 t(SATA)] 6 Gbps > phy 20:U:attached:[500093d23000a154:00 t(SATA)] 6 Gbps > phy 21:U:attached:[500093d23000a155:00 t(SATA)] 6 Gbps > phy 22:U:attached:[500093d23000a156:00 t(SATA)] 6 Gbps > phy 23:U:attached:[500093d23000a157:00 t(SATA)] 6 Gbps > phy 24:U:attached:[500093d23000a158:00 t(SATA)] 6 Gbps > phy 25:U:attached:[500093d23000a159:00 t(SATA)] 6 Gbps > phy 26:U:attached:[500093d23000a15a:00 t(SATA)] 6 Gbps > phy 27:U:attached:[500093d23000a15b:00 t(SATA)] 6 Gbps > phy 28:U:attached:[500093d23000a15c:00 t(SATA)] 6 Gbps > phy 29:U:attached:[500093d23000a15d:00 t(SATA)] 6 Gbps > phy 30:U:attached:[500093d23000a15e:00 t(SATA)] 6 Gbps > phy 32:U:attached:[500093d23000a160:00 t(SATA)] 6 Gbps > phy 33:U:attached:[500093d23000a161:00 t(SATA)] 6 Gbps > phy 34:U:attached:[500093d23000a162:00 t(SATA)] 6 Gbps > phy 35:U:attached:[500093d23000a163:00 t(SATA)] 6 Gbps > phy 36:U:attached:[500093d23000a164:00 t(SATA)] 6 Gbps > phy 37:U:attached:[500093d23000a165:00 t(SATA)] 6 Gbps > phy 38:U:attached:[500093d23000a166:00 t(SATA)] 6 Gbps > phy 39:U:attached:[500093d23000a167:00 t(SATA)] 6 Gbps > phy 40:D:attached:[500093d23000a17d:00 V i(SMP) t(SSP)] 12 Gbps > <- This expander's SEP >=20 > -Alan From owner-freebsd-scsi@freebsd.org Thu Jun 1 09:49:35 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C088BF7C1E for ; Thu, 1 Jun 2017 09:49:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1A9A471DA9 for ; Thu, 1 Jun 2017 09:49:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v519nYvM049650 for ; Thu, 1 Jun 2017 09:49:34 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-scsi@FreeBSD.org Subject: [Bug 219701] crash in camperiphfree() Date: Thu, 01 Jun 2017 09:49:34 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: avg@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-scsi@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 09:49:35 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219701 Andriy Gapon changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ken@FreeBSD.org, | |scottl@FreeBSD.org Assignee|freebsd-bugs@FreeBSD.org |freebsd-scsi@FreeBSD.org --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-scsi@freebsd.org Thu Jun 1 09:29:55 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 87DEFBF65F0 for ; Thu, 1 Jun 2017 09:29:55 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 299D1712A7 for ; Thu, 1 Jun 2017 09:29:54 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v519TrxE077214 for ; Thu, 1 Jun 2017 11:29:53 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 18DEBE56; Thu, 1 Jun 2017 11:29:53 +0200 (CEST) Message-ID: <592FDE8C.1090609@omnilan.de> Date: Thu, 01 Jun 2017 11:29:48 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org Subject: mps(4) blocks panic-reboot Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Thu, 01 Jun 2017 11:29:53 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 09:29:55 -0000 Hello, I'm not sure if scsi@ is the correct list, but since my problem seems to be mps(4) related, I thought asking here shouldn't be too wrong. There's not much to add to the topic: If my stable/11 setup panics, the system doesn't reboot (besides IPMI-watchdog takes over this task). The machine stucks with these last lines: Dumping 1669 out of 15734 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 mps0: Incrementing SSU count mps0: Sending StopUnit: path (xpt0:mps0:0:3:ffffffff): handle 11 mps0: Incrementing SSU count mps0: Sending StopUnit: path (xpt0:mps0:0:4:ffffffff): handle 10 mps0: Incrementing SSU count mps0: Sending StopUnit: path (xpt0:mps0:0:5:ffffffff): handle 9 mps0: Incrementing SSU count mps0: Sending StopUnit: path (xpt0:mps0:0:6:ffffffff): handle 13 mps0: Incrementing SSU count Then, nothing happens. On a similar setup without mps(4), the machine reboots after the panic. Is this a known problem/feature? Thanks, -harry From owner-freebsd-scsi@freebsd.org Thu Jun 1 14:50:39 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BFD64AFE9DD for ; Thu, 1 Jun 2017 14:50:39 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x231.google.com (mail-it0-x231.google.com [IPv6:2607:f8b0:4001:c0b::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8A5A77CEB8 for ; Thu, 1 Jun 2017 14:50:39 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x231.google.com with SMTP id m47so41021818iti.1 for ; Thu, 01 Jun 2017 07:50:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to; bh=JWHXIoxuXfYu6WkbZpUXZ0jh3m6T5h4AcR+Da3urGq4=; b=BPPw6hi50qO69zUSR00jvhViHjrjbCdCKDM67HCu07VLJvtP46NTWRTFczNsuzKxXu 87Nwghir3kBXorbRYLSrpNMLMYmhWLJfeNP0d3sXOwLUsFYaiagYugtZHb8BDmhbQkTh ITzLPHVBOFxkB1awIhJTeQLrW2U03cRZbDweU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to; bh=JWHXIoxuXfYu6WkbZpUXZ0jh3m6T5h4AcR+Da3urGq4=; b=mV40pNuFQAKxrG6sfMEqLtw+gJJtbpFpVcmjwktp2BzBe/8xSnW2Ao90uTGDexClsP 8wOLm23GTW8cIv8mM9FqbL8uDZLZNvAemU1zyRFdomUEGma7juYlCSM4oMySiKDhLfHO ohKoNVLO6zyxDHfGO9TiKLhdzcNYILxyHwsvrPukSUn12UtAylCzq+Tto0tV54TQnNRr oK6RptfMIM3qY5K90GKUufYExDARY82Y0JXvzAN4Px82Sx/esqKoLIEBLOQ8b1JGpCtK a9DwBYgfKoL+mff9x1iyb3fcp1clMRb2KvnHZCj3ISioDWolKgncvuGRXTd1ML/DETYc HneQ== X-Gm-Message-State: AODbwcBsPE+ltA5N2+YGzOkgmHJqeihLc8CUmrBGE4gH4Eadiqb9fhxm /W5bKRQ8MljPfygnW1fc4GmbmUtllj6a X-Received: by 10.36.94.84 with SMTP id h81mr3626519itb.35.1496328638888; Thu, 01 Jun 2017 07:50:38 -0700 (PDT) From: Stephen Mcconnell References: <592FDE8C.1090609@omnilan.de> In-Reply-To: <592FDE8C.1090609@omnilan.de> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQK5uw9AxlTbZs3SRUL7gsvMDeNX4aBCVoJQ Date: Thu, 1 Jun 2017 08:50:37 -0600 Message-ID: <12a36df9eff99c77ec621987efbe75fe@mail.gmail.com> Subject: RE: mps(4) blocks panic-reboot To: Harry Schmalzbauer , freebsd-scsi@freebsd.org, Scott Long Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 14:50:39 -0000 Scott, did you have a fix for this some time back? Steve > -----Original Message----- > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > scsi@freebsd.org] On Behalf Of Harry Schmalzbauer > Sent: Thursday, June 01, 2017 3:30 AM > To: freebsd-scsi@freebsd.org > Subject: mps(4) blocks panic-reboot > > Hello, > > I'm not sure if scsi@ is the correct list, but since my problem seems to be mps(4) > related, I thought asking here shouldn't be too wrong. > > There's not much to add to the topic: If my stable/11 setup panics, the system > doesn't reboot (besides IPMI-watchdog takes over this task). > The machine stucks with these last lines: > Dumping 1669 out of 15734 > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > Dump complete > mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 > mps0: Incrementing SSU count > mps0: Sending StopUnit: path (xpt0:mps0:0:3:ffffffff): handle 11 > mps0: Incrementing SSU count > mps0: Sending StopUnit: path (xpt0:mps0:0:4:ffffffff): handle 10 > mps0: Incrementing SSU count > mps0: Sending StopUnit: path (xpt0:mps0:0:5:ffffffff): handle 9 > mps0: Incrementing SSU count > mps0: Sending StopUnit: path (xpt0:mps0:0:6:ffffffff): handle 13 > mps0: Incrementing SSU count > > Then, nothing happens. On a similar setup without mps(4), the machine reboots > after the panic. > > Is this a known problem/feature? > > Thanks, > > -harry > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Thu Jun 1 15:25:51 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 44173AFF588 for ; Thu, 1 Jun 2017 15:25:51 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x230.google.com (mail-it0-x230.google.com [IPv6:2607:f8b0:4001:c0b::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 110867E6F1 for ; Thu, 1 Jun 2017 15:25:51 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x230.google.com with SMTP id m47so22820695iti.0 for ; Thu, 01 Jun 2017 08:25:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to; bh=A/Q7nYMxr6YlxFil28EDYf24t8Jde+aroPBkXK/mTLY=; b=IbBrSqNsZmAD/8jdAXpUx5KFPT3bntFwv7RTbLhVKbBIzp1lUP1KyzH2kFe7Do/cAx M1o+69E6Wx9DpOwcIDE6i+7NNcK6WTd3EOja6AVc4G51dTjFcBbvVXDS9wWiREHEGJE+ 71qXt/Knzufeh598QjihEAM9278MSSfEuFYtg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to; bh=A/Q7nYMxr6YlxFil28EDYf24t8Jde+aroPBkXK/mTLY=; b=UUApc5kpaX2cp1FwDv/ptG32gHF2ORBFi0OiE0ya45+zxGIFo7ShwKNuvDIRmMve1y KXY5L0E4kk2PR17fPdKIZ+cpX+nxz1yP2D7WT/CFYsdgZcWHzHBqtWgQ2fcfuGFeJO/k GAscZqgXKzHMOJdcCEa1Advd2d5iC0IJlJBq1YCFAPnt9KvsathFQS9D6qFLVXzM1vnT RhwXnZbCeKnpATKZceIhsGvBce49/B/6MNJwAeDA4p26Sx5a9+TxL6b11krmYNylZ8oi ve6ucpKTCrbWTQNv5OOu44Jaf6Fg9BbXkIY1PtZWxLW0U3WH1EtP6oEgL2jWZQGVYJnC q8Ww== X-Gm-Message-State: AODbwcAk0rsnj+4Pat8tUjim5/uKKSoCs3p5cX4NTo4gVRj3hWNJNF9D pLYBG2lyhZUnwrQyfAJQ/dQWd/QSGc0L X-Received: by 10.36.108.212 with SMTP id w203mr3769551itb.55.1496330750345; Thu, 01 Jun 2017 08:25:50 -0700 (PDT) From: Stephen Mcconnell References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com In-Reply-To: 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQK5uw9AxlTbZs3SRUL7gsvMDeNX4aBCVoJQgAAICqA= Date: Thu, 1 Jun 2017 09:25:49 -0600 Message-ID: Subject: RE: mps(4) blocks panic-reboot To: Harry Schmalzbauer , freebsd-scsi@freebsd.org, Scott Long Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 15:25:51 -0000 I found a couple of emails between me and Scott a while back and we talked about this. The problem is that the SSU handling relies on interrupts, but interrupts stop due to the panic, so it hangs. Scott came up with a way around it but we never decided on a final fix and then it was forgotten about. If you have a way to reproduce this, I can try to find a fix here. Or, I might be able to force the system to panic at the right time. Steve > -----Original Message----- > From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com] > Sent: Thursday, June 01, 2017 8:51 AM > To: 'Harry Schmalzbauer'; 'freebsd-scsi@freebsd.org'; 'Scott Long' > Subject: RE: mps(4) blocks panic-reboot > > Scott, did you have a fix for this some time back? > > Steve > > > -----Original Message----- > > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > > scsi@freebsd.org] On Behalf Of Harry Schmalzbauer > > Sent: Thursday, June 01, 2017 3:30 AM > > To: freebsd-scsi@freebsd.org > > Subject: mps(4) blocks panic-reboot > > > > Hello, > > > > I'm not sure if scsi@ is the correct list, but since my problem seems > > to be mps(4) related, I thought asking here shouldn't be too wrong. > > > > There's not much to add to the topic: If my stable/11 setup panics, > > the system doesn't reboot (besides IPMI-watchdog takes over this task). > > The machine stucks with these last lines: > > Dumping 1669 out of 15734 > > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > > Dump complete > > mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 > > mps0: Incrementing SSU count > > mps0: Sending StopUnit: path (xpt0:mps0:0:3:ffffffff): handle 11 > > mps0: Incrementing SSU count > > mps0: Sending StopUnit: path (xpt0:mps0:0:4:ffffffff): handle 10 > > mps0: Incrementing SSU count > > mps0: Sending StopUnit: path (xpt0:mps0:0:5:ffffffff): handle 9 > > mps0: Incrementing SSU count > > mps0: Sending StopUnit: path (xpt0:mps0:0:6:ffffffff): handle 13 > > mps0: Incrementing SSU count > > > > Then, nothing happens. On a similar setup without mps(4), the machine > > reboots after the panic. > > > > Is this a known problem/feature? > > > > Thanks, > > > > -harry > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Thu Jun 1 15:29:43 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6CA23AFF77C for ; Thu, 1 Jun 2017 15:29:43 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3D8517EB9F; Thu, 1 Jun 2017 15:29:42 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 32C0820951; Thu, 1 Jun 2017 11:29:36 -0400 (EDT) Received: from frontend2 ([10.202.2.161]) by compute6.internal (MEProxy); Thu, 01 Jun 2017 11:29:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsco.org; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc:x-sasl-enc; s=fm1; bh=nAUAabkNLlWJDs/b8P 074zpUH5IOR6ZG7lySozCiOv4=; b=XgsFkafuvfNe2bidifKTbDcinE6QBLRHEa WMDMYGdcqUZsF2l3japnOAsl/O798sHghbwjniESk0E+Q2E4EDqOPdasTpixSgA2 ifylVparTUZrEoS7o0gUrUw1S/TAZkc6XR4cHt1J9hmNwC2a/uNucgkw42iT6xoO dlhcyqJdREHLJ1yiVbwwyoUkvFbzHKRPUxbVC9y5OKaI+5fdO7enK1vcIk0IfRUP R67JzWEL3Ad3FiLJZnGgO4gZbdK99oKHHytgBuex9vu+tQ7DEEqrwfAt+qesp7T5 9fRyLUQgvIU/aX8tf0r2Krc3cgMobgAlVxLP/FWvPCG7bEoiM7Sw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s= fm1; bh=nAUAabkNLlWJDs/b8P074zpUH5IOR6ZG7lySozCiOv4=; b=evyBP3qk aQ6We+I/kcxGWtWd22uLKBYMlDfN3kiGShEb/LVmEzfQf5Iq1ikRkbeVGqtkht8n 30Kc/Ccz4rziQ85+X2IHn0beQDQ6TAJWTzhF6jb7Bsxxudy/lQeg4H6stSXSeEUH d99TybJnxVAE04iaAeMeNzUK2uIsynNgNoafcQjlfkOmKJnIA9MmPUxvwpt/O2UJ 2SGjYhRrFroZqJ83XYH5S/w9Ce3Lk64dGR2rHzNI+fuyUdLVdibfYKmSO4Znqo2x OFcug4VU5kVhHSSGIC2gdMT//DWql/45TVRWcB+6eZqxft2Srrx1fimSJLHAv5Bx nPsIQqKi53ppaw== X-ME-Sender: X-Sasl-enc: u9/zicowU3Mb4KhPrP7LPTu5WTGSPijGYYvC5GN44fef 1496330975 Received: from [100.107.187.240] (52.sub-70-196-83.myvzw.com [70.196.83.52]) by mail.messagingengine.com (Postfix) with ESMTPA id D61522486C; Thu, 1 Jun 2017 11:29:35 -0400 (EDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: mps(4) blocks panic-reboot From: Scott Long X-Mailer: iPhone Mail (14F89) In-Reply-To: Date: Thu, 1 Jun 2017 10:29:34 -0500 Cc: Harry Schmalzbauer , freebsd-scsi@freebsd.org, Scott Long Content-Transfer-Encoding: quoted-printable Message-Id: References: <592FDE8C.1090609@omnilan.de> To: Stephen Mcconnell X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 15:29:43 -0000 Good summary Steve, I forgot about this. I'm traveling today, but if I have= time tonight I'll see if I still have any notes or patches. Scott Sent from my iPhone > On Jun 1, 2017, at 10:25 AM, Stephen Mcconnell wrote: >=20 > I found a couple of emails between me and Scott a while back and we talked= > about this. The problem is that the SSU handling relies on interrupts, but= > interrupts stop due to the panic, so it hangs. Scott came up with a way > around it but we never decided on a final fix and then it was forgotten > about. If you have a way to reproduce this, I can try to find a fix here. > Or, I might be able to force the system to panic at the right time. >=20 > Steve >=20 >> -----Original Message----- >> From: Stephen Mcconnell [mailto:stephen.mcconnell@broadcom.com] >> Sent: Thursday, June 01, 2017 8:51 AM >> To: 'Harry Schmalzbauer'; 'freebsd-scsi@freebsd.org'; 'Scott Long' >> Subject: RE: mps(4) blocks panic-reboot >>=20 >> Scott, did you have a fix for this some time back? >>=20 >> Steve >>=20 >>> -----Original Message----- >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >>> scsi@freebsd.org] On Behalf Of Harry Schmalzbauer >>> Sent: Thursday, June 01, 2017 3:30 AM >>> To: freebsd-scsi@freebsd.org >>> Subject: mps(4) blocks panic-reboot >>>=20 >>> Hello, >>>=20 >>> I'm not sure if scsi@ is the correct list, but since my problem seems >>> to be mps(4) related, I thought asking here shouldn't be too wrong. >>>=20 >>> There's not much to add to the topic: If my stable/11 setup panics, >>> the system doesn't reboot (besides IPMI-watchdog takes over this > task). >>> The machine stucks with these last lines: >>> Dumping 1669 out of 15734 >>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% >>> Dump complete >>> mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 >>> mps0: Incrementing SSU count >>> mps0: Sending StopUnit: path (xpt0:mps0:0:3:ffffffff): handle 11 >>> mps0: Incrementing SSU count >>> mps0: Sending StopUnit: path (xpt0:mps0:0:4:ffffffff): handle 10 >>> mps0: Incrementing SSU count >>> mps0: Sending StopUnit: path (xpt0:mps0:0:5:ffffffff): handle 9 >>> mps0: Incrementing SSU count >>> mps0: Sending StopUnit: path (xpt0:mps0:0:6:ffffffff): handle 13 >>> mps0: Incrementing SSU count >>>=20 >>> Then, nothing happens. On a similar setup without mps(4), the machine >>> reboots after the panic. >>>=20 >>> Is this a known problem/feature? >>>=20 >>> Thanks, >>>=20 >>> -harry >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to > "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@freebsd.org Thu Jun 1 15:36:41 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82991AFFAD8 for ; Thu, 1 Jun 2017 15:36:41 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1E5707F241; Thu, 1 Jun 2017 15:36:40 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v51FabCJ081851; Thu, 1 Jun 2017 17:36:37 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 776B9F32; Thu, 1 Jun 2017 17:36:37 +0200 (CEST) Message-ID: <59303484.1040609@omnilan.de> Date: Thu, 01 Jun 2017 17:36:36 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Stephen Mcconnell CC: freebsd-scsi@freebsd.org, Scott Long Subject: Re: mps(4) blocks panic-reboot References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Thu, 01 Jun 2017 17:36:37 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 15:36:41 -0000 Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 17:25 (localtime): > I found a couple of emails between me and Scott a while back and we talked > about this. The problem is that the SSU handling relies on interrupts, but > interrupts stop due to the panic, so it hangs. Scott came up with a way > around it but we never decided on a final fix and then it was forgotten > about. If you have a way to reproduce this, I can try to find a fix here. > Or, I might be able to force the system to panic at the right time. Thank you very much for your attention! I remember haveing read some discussion about that topic but thought it was fixed and haven't searched any further; thanks for doing that job :-) I can reproduce at any time, willing to test anything (which I can get to compile on stable/11)! This is a semi-productive machine where I evaluate some netmap/bhyve options/stragtegies. Thanks, -harry From owner-freebsd-scsi@freebsd.org Thu Jun 1 17:36:37 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CC359B7B131 for ; Thu, 1 Jun 2017 17:36:37 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x22e.google.com (mail-it0-x22e.google.com [IPv6:2607:f8b0:4001:c0b::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 92F9383CBB for ; Thu, 1 Jun 2017 17:36:37 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x22e.google.com with SMTP id r63so40438576itc.1 for ; Thu, 01 Jun 2017 10:36:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:cc; bh=NU/PqY5T2d+8IitmNk42UxrsBO3H45ay2dOuLXrorcs=; b=Hpb/HioTk2kQV8msbeDA/NIxopz05BKm22B8f1bRj6jj6KsyxwQiS8uWMQEZjmXnsx jGaDVXeF61w9Z+Hxrog6hwuw9nw7R8ImcP9EKw4yQVulCgXgzJomfu31tMGrxr8nVZX/ 6vNDPrN8eCcQba1B9nkAKjIVgNqidxCNP+p+M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc; bh=NU/PqY5T2d+8IitmNk42UxrsBO3H45ay2dOuLXrorcs=; b=VyQzSaAj8Hmrnp7giUSXaaqhO7OYaOvKgKJ3yRibuTP1LLnm+tf2F9aFWTm2spwSpN wBE/hvtxeoL3SLACZn7QwRv4tAi13z474CCsiSsGNNbwKxq2rsUeJOiggCMT5rJmWj9l 9ZfyHDsB7fCz+L9fmtL2nildy8FccsG867hu00UJiBcPLyhubALF0sEtFOp9blhVf+Ht nmE7mh7sZRs8zTNpxZ/D3BXVYxXvDl5OxqjhavyLZj9l/DWdhSu+U7+1mjjuhkVHaObR aWOtVEEw7orAyXk4dZBKy3tzk4cTAV8olJ/L1PPqHpVRO/WWLxHUidxABn893l+PzK78 bGQA== X-Gm-Message-State: AODbwcD8APT23TmyPK7pj/nOpT4h3+MJevBX/NQysBGxpgnyG3hYdSu0 TgxBnYkaS8eiTg4o02WJta2edqm6UT7c X-Received: by 10.36.87.84 with SMTP id u81mr395664ita.35.1496338596877; Thu, 01 Jun 2017 10:36:36 -0700 (PDT) From: Stephen Mcconnell References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> In-Reply-To: <59303484.1040609@omnilan.de> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQK5uw9AxlTbZs3SRUL7gsvMDeNX4QK127o/Aqh6HjigF5FLIA== Date: Thu, 1 Jun 2017 11:36:35 -0600 Message-ID: Subject: RE: mps(4) blocks panic-reboot To: Harry Schmalzbauer Cc: freebsd-scsi@freebsd.org, Scott Long Content-Type: multipart/mixed; boundary="001a1134f23a888bb40550e97a55" X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 17:36:37 -0000 --001a1134f23a888bb40550e97a55 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Can you try the attached patch and let me know how it goes? I didn't test it, but since you know how, it might be easier this way. This was diff'd from the latest mps files in stable/11, which I recently updated (today). Thanks, Steve > -----Original Message----- > From: Harry Schmalzbauer [mailto:freebsd@omnilan.de] > Sent: Thursday, June 01, 2017 9:37 AM > To: Stephen Mcconnell > Cc: freebsd-scsi@freebsd.org; Scott Long > Subject: Re: mps(4) blocks panic-reboot > > Bez=C3=BCglich Stephen Mcconnell's Nachricht vom 01.06.2017 17:25 (localt= ime): > > I found a couple of emails between me and Scott a while back and we > > talked about this. The problem is that the SSU handling relies on > > interrupts, but interrupts stop due to the panic, so it hangs. Scott > > came up with a way around it but we never decided on a final fix and > > then it was forgotten about. If you have a way to reproduce this, I can > > try to > find a fix here. > > Or, I might be able to force the system to panic at the right time. > > Thank you very much for your attention! > > I remember haveing read some discussion about that topic but thought it > was > fixed and haven't searched any further; thanks for doing that job :-) > > I can reproduce at any time, willing to test anything (which I can get to > compile > on stable/11)! > This is a semi-productive machine where I evaluate some netmap/bhyve > options/stragtegies. > > Thanks, > > -harry --001a1134f23a888bb40550e97a55 Content-Type: application/octet-stream; name="mps_ssu_polled.diff.tar" Content-Disposition: attachment; filename="mps_ssu_polled.diff.tar" Content-Transfer-Encoding: base64 X-Attachment-Id: 9c48e788b0c17cef_0.1 bXBzX3NzdV9wb2xsZWQuZGlmZgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAwMDY0NCAAMDAwMDAw IAAwMDAwMDAgADAwMDAwMDA2NjEwIDEzMTE0MDQ3NzY3IDAxNTA1NAAgMAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB1c3RhcgAwMHJvb3QAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAd2hlZWwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwMDAwMDAgADAwMDAw MCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABJ bmRleDogbXBzX3Nhcy5jCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIG1wc19zYXMuYwkocmV2aXNpb24gMzE5NDQ2 KQorKysgbXBzX3Nhcy5jCSh3b3JraW5nIGNvcHkpCkBAIC0yMjExLDE4ICsyMjExLDYgQEAKIAkJ fQogCX0KIAotCS8qCi0JICogSWYgdGhpcyBpcyBhIFN0YXJ0IFN0b3AgVW5pdCBjb21tYW5kIGFu ZCBpdCB3YXMgaXNzdWVkIGJ5IHRoZSBkcml2ZXIKLQkgKiBkdXJpbmcgc2h1dGRvd24sIGRlY3Jl bWVudCB0aGUgcmVmY291bnQgdG8gYWNjb3VudCBmb3IgYWxsIG9mIHRoZQotCSAqIGNvbW1hbmRz IHRoYXQgd2VyZSBzZW50LiAgQWxsIFNTVSBjb21tYW5kcyBzaG91bGQgYmUgY29tcGxldGVkIGJl Zm9yZQotCSAqIHNodXRkb3duIGNvbXBsZXRlcywgbWVhbmluZyBTU1VfcmVmY291bnQgd2lsbCBi ZSAwIGFmdGVyIFNTVV9zdGFydGVkCi0JICogaXMgVFJVRS4KLQkgKi8KLQlpZiAoc2MtPlNTVV9z dGFydGVkICYmIChjc2lvLT5jZGJfaW8uY2RiX2J5dGVzWzBdID09IFNUQVJUX1NUT1BfVU5JVCkp IHsKLQkJbXBzX2RwcmludChzYywgTVBTX0lORk8sICJEZWNyZW1lbnRpbmcgU1NVIGNvdW50Llxu Iik7Ci0JCXNjLT5TU1VfcmVmY291bnQtLTsKLQl9Ci0KIAkvKiBUYWtlIHRoZSBmYXN0IHBhdGgg dG8gY29tcGxldGlvbiAqLwogCWlmIChjbS0+Y21fcmVwbHkgPT0gTlVMTCkgewogCQlpZiAobXBz c2FzX2dldF9jY2JzdGF0dXMoY2NiKSA9PSBDQU1fUkVRX0lOUFJPRykgewpJbmRleDogbXBzX3Nh c19sc2kuYwo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09Ci0tLSBtcHNfc2FzX2xzaS5jCShyZXZpc2lvbiAzMTk0NDYpCisr KyBtcHNfc2FzX2xzaS5jCSh3b3JraW5nIGNvcHkpCkBAIC0xMTE3LDEzICsxMTE3LDEzIEBACiAJ dGFyZ2V0X2lkX3QgdGFyZ2V0aWQ7CiAJc3RydWN0IG1wc3Nhc190YXJnZXQgKnRhcmdldDsKIAlj aGFyIHBhdGhfc3RyWzY0XTsKLQlzdHJ1Y3QgdGltZXZhbCBjdXJfdGltZSwgc3RhcnRfdGltZTsK IAogCS8qCi0JICogRm9yIGVhY2ggdGFyZ2V0LCBpc3N1ZSBhIFN0YXJ0U3RvcFVuaXQgY29tbWFu ZCB0byBzdG9wIHRoZSBkZXZpY2UuCisJICogRGlzYWJsZSBpbnRlcnJ1cHRzIG5vdyBiZWNhdXNl IHNodXRkb3duIGlzIGluIHByb2dyZXNzIGFuZCBkb24ndCB3YW50CisJICogdG8gcmVseSBvbiBJ U1IgdG8gY29tcGxldGUgdGhlc2UuIFNvIHBvbGxpbmcgbXVzdCBiZSBkb25lIGhlcmUgYXMKKwkg KiB3ZWxsLgogCSAqLwotCXNjLT5TU1Vfc3RhcnRlZCA9IFRSVUU7Ci0Jc2MtPlNTVV9yZWZjb3Vu dCA9IDA7CisJbXBzX21hc2tfaW50cihzYyk7CiAJZm9yICh0YXJnZXRpZCA9IDA7IHRhcmdldGlk IDwgc2MtPm1heF9kZXZpY2VzOyB0YXJnZXRpZCsrKSB7CiAJCXRhcmdldCA9ICZzYXNzYy0+dGFy Z2V0c1t0YXJnZXRpZF07CiAJCWlmICh0YXJnZXQtPmhhbmRsZSA9PSAweDApIHsKQEAgLTExNTcs MTIgKzExNTcsOSBAQAogCQkJICAgICJoYW5kbGUgJWRcbiIsIHBhdGhfc3RyLCB0YXJnZXQtPmhh bmRsZSk7CiAJCQkKIAkJCS8qCi0JCQkgKiBJc3N1ZSBhIFNUQVJUIFNUT1AgVU5JVCBjb21tYW5k IGZvciB0aGUgdGFyZ2V0LgotCQkJICogSW5jcmVtZW50IHRoZSBTU1UgY291bnRlciB0byBiZSB1 c2VkIHRvIGNvdW50IHRoZQotCQkJICogbnVtYmVyIG9mIHJlcXVpcmVkIHJlcGxpZXMuCisJCQkg KiBJc3N1ZSBhIFNUQVJUIFNUT1AgVU5JVCBjb21tYW5kIGZvciB0aGUgdGFyZ2V0IGFuZAorCQkJ ICogcG9sbCBmb3IgY29tcGxldGlvbi4KIAkJCSAqLwotCQkJbXBzX2RwcmludChzYywgTVBTX0lO Rk8sICJJbmNyZW1lbnRpbmcgU1NVIGNvdW50XG4iKTsKLQkJCXNjLT5TU1VfcmVmY291bnQrKzsK IAkJCWNjYi0+Y2NiX2gudGFyZ2V0X2lkID0KIAkJCSAgICB4cHRfcGF0aF90YXJnZXRfaWQoY2Ni LT5jY2JfaC5wYXRoKTsKIAkJCWNjYi0+Y2NiX2gucHByaXZfcHRyMSA9IHNhc3NjOwpAQCAtMTE3 NSwyNyArMTE3Miw5IEBACiAJCQkgICAgLyppbW1lZGlhdGUqL0ZBTFNFLAogCQkJICAgIE1QU19T RU5TRV9MRU4sCiAJCQkgICAgLyp0aW1lb3V0Ki8xMDAwMCk7Ci0JCQl4cHRfYWN0aW9uKGNjYik7 CisJCQl4cHRfcG9sbGVkX2FjdGlvbihjY2IpOwogCQl9CiAJfQotCi0JLyoKLQkgKiBXYWl0IHVu dGlsIGFsbCBvZiB0aGUgU1NVIGNvbW1hbmRzIGhhdmUgY29tcGxldGVkIG9yIHRpbWUgaGFzCi0J ICogZXhwaXJlZCAoNjAgc2Vjb25kcykuICBQYXVzZSBmb3IgMTAwbXMgZWFjaCB0aW1lIHRocm91 Z2guICBJZiBhbnkKLQkgKiBjb21tYW5kIHRpbWVzIG91dCwgdGhlIHRhcmdldCB3aWxsIGJlIHJl c2V0IGluIHRoZSBTQ1NJIGNvbW1hbmQKLQkgKiB0aW1lb3V0IHJvdXRpbmUuCi0JICovCi0JZ2V0 bWljcm90aW1lKCZzdGFydF90aW1lKTsKLQl3aGlsZSAoc2MtPlNTVV9yZWZjb3VudCkgewotCQlw YXVzZSgibXBzd2FpdCIsIGh6LzEwKTsKLQkJCi0JCWdldG1pY3JvdGltZSgmY3VyX3RpbWUpOwot CQlpZiAoKGN1cl90aW1lLnR2X3NlYyAtIHN0YXJ0X3RpbWUudHZfc2VjKSA+IDYwKSB7Ci0JCQlt cHNfZHByaW50KHNjLCBNUFNfRkFVTFQsICJUaW1lIGhhcyBleHBpcmVkIHdhaXRpbmcgIgotCQkJ ICAgICJmb3IgU1NVIGNvbW1hbmRzIHRvIGNvbXBsZXRlLlxuIik7Ci0JCQlicmVhazsKLQkJfQot CX0KIH0KIAogc3RhdGljIHZvaWQKQEAgLTEyMTQsOSArMTE5Myw3IEBACiAJICAgIHBhdGhfc3Ry KTsKIAogCS8qCi0JICogTm90aGluZyBtb3JlIHRvIGRvIGV4Y2VwdCBmcmVlIHRoZSBDQ0IgYW5k IHBhdGguICBJZiB0aGUgY29tbWFuZAotCSAqIHRpbWVkIG91dCwgYW4gYWJvcnQgcmVzZXQsIHRo ZW4gdGFyZ2V0IHJlc2V0IHdpbGwgYmUgaXNzdWVkIGR1cmluZwotCSAqIHRoZSBTQ1NJIENvbW1h bmQgcHJvY2Vzcy4KKwkgKiBOb3RoaW5nIG1vcmUgdG8gZG8gZXhjZXB0IGZyZWUgdGhlIENDQiBh bmQgcGF0aC4KIAkgKi8KIAl4cHRfZnJlZV9wYXRoKGRvbmVfY2NiLT5jY2JfaC5wYXRoKTsKIAl4 cHRfZnJlZV9jY2IoZG9uZV9jY2IpOwpJbmRleDogbXBzdmFyLmgKPT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gbXBz dmFyLmgJKHJldmlzaW9uIDMxOTQ0NikKKysrIG1wc3Zhci5oCSh3b3JraW5nIGNvcHkpCkBAIC00 MjEsMTAgKzQyMSw2IEBACiAKIAljaGFyCQkJCWV4Y2x1ZGVfaWRzWzgwXTsKIAlzdHJ1Y3QgdGlt ZXZhbAkJCWxhc3RmYWlsOwotCi0JLyogU3RhcnRTdG9wVW5pdCBjb21tYW5kIGhhbmRsaW5nIGF0 IHNodXRkb3duICovCi0JdWludDMyX3QJCQlTU1VfcmVmY291bnQ7Ci0JdWludDhfdAkJCQlTU1Vf c3RhcnRlZDsKIH07CiAKIHN0cnVjdCBtcHNfY29uZmlnX3BhcmFtcyB7Cga1134f23a888bb40550e97a55-- From owner-freebsd-scsi@freebsd.org Thu Jun 1 18:03:26 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DCB5CB7B6FD for ; Thu, 1 Jun 2017 18:03:26 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7B963847AF; Thu, 1 Jun 2017 18:03:26 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v51I3Mok083113; Thu, 1 Jun 2017 20:03:22 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 6C660F70; Thu, 1 Jun 2017 20:03:22 +0200 (CEST) Message-ID: <593056E9.6000807@omnilan.de> Date: Thu, 01 Jun 2017 20:03:21 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Stephen Mcconnell CC: freebsd-scsi@freebsd.org, Scott Long Subject: Re: mps(4) blocks panic-reboot References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Thu, 01 Jun 2017 20:03:22 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 18:03:27 -0000 Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 (localtime): > Can you try the attached patch and let me know how it goes? I didn't test > it, but since you know how, it might be easier this way. This was diff'd > from the latest mps files in stable/11, which I recently updated (today). Thanks a lot, I noticed the highly appreciated MFC! Things are cooking... There were sysdecode userland changes, so I need to buidl world also, before my rollout system provides the update for this machine – will be ready in an hour. Since I have expert's attention, I'd like to ask a another mps(4) related question: I had unionfs deadlocks. (I'm aware of the broken status of unionfs, and since I'm not able to fix it myself at the moment, I already replaced it with nullfs where possible, true for the following event) Since this machine has a memory-disk as rootfs (and 5 SSDs via mps(4) for bootpool and a separate syspool, where /var e.g. lives), I guess the deadlock is responsible for simultanious disappearance of all mps(4) attached drives. Is that plausable? (meaning, does the mps(4) driver depend on filesystem subsystem?) Or do you have any idea what else could lead to disapearance of all drives simultaniously? Other ata drives, via on-board ahci (C203) were not affected! UNfortunately, I haven't been able to record any kernel messages when that happened (3 times as far as I remember, no occurence since abandoning unionfs yet) Thanks, -harry From owner-freebsd-scsi@freebsd.org Thu Jun 1 18:12:39 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D480AB7B9EA for ; Thu, 1 Jun 2017 18:12:39 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x234.google.com (mail-it0-x234.google.com [IPv6:2607:f8b0:4001:c0b::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9368584BD6 for ; Thu, 1 Jun 2017 18:12:39 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x234.google.com with SMTP id m62so975642itc.0 for ; Thu, 01 Jun 2017 11:12:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:cc:content-transfer-encoding; bh=1K4d03gRUSHJZ8Izjcsy+IdXncmSCpAUvNGyaZv/hbk=; b=etSepBVqvX1G0PcPXM7RuxNEC9+IjYeHvM5f+ARdOy7IHbkRyHjuiW2/fTWd7iMU2/ lU+6sKVMfH5gMMpcBkSZi4ZFkPBPZogRR7G01FMA3lB9ciOyBLHUUOAU97tXIUhwFIY6 jH5RdwBjHEDtSwINSDoc0ycV4fkmUVHeV+zPc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc :content-transfer-encoding; bh=1K4d03gRUSHJZ8Izjcsy+IdXncmSCpAUvNGyaZv/hbk=; b=Bu+RMfdlAzJteSGgwQRUJJprhO+6hQmpraa67NbPENL2wXhwYhAuDTb6FkoV2GNvgi fGqOAaPbxDZknvI12k68omTfnA02BkDhE2f4RhsHoxz0249iEeOB9ymqXRYEzlKnbyYf O0d+xpWHnOVKdcisO5Ica2KOKf8WJyXrjQEkxtWHwlhSzpmy69RVDPCSBe8GZTijRWtE ngArumcY20H8rHDzD+hNoi3qMnhz2m78bym409Ms2qkFGMfjIAmKsHe4xNWcTfrsM3sF gCtCiHUDrdWn371EhlVFM+HpHkpPskuj+ffTF/ZxwOEOpecZHc6Q8lyl/Ovwxgl+NNsm a2SA== X-Gm-Message-State: AODbwcBOdKxnNk9I19iUjph5AynCDvmQ//Z4lzRADL91nzin4XHNpjj5 D6h0RDmQqn2wRSKcGbnF3nsy/lqpSaEQ X-Received: by 10.36.238.129 with SMTP id b123mr609014iti.10.1496340758836; Thu, 01 Jun 2017 11:12:38 -0700 (PDT) From: Stephen Mcconnell References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> <593056E9.6000807@omnilan.de> In-Reply-To: <593056E9.6000807@omnilan.de> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQK5uw9AxlTbZs3SRUL7gsvMDeNX4QK127o/Aqh6HjgB6++7AwGq54Urn/rkm7A= Date: Thu, 1 Jun 2017 12:12:37 -0600 Message-ID: Subject: RE: mps(4) blocks panic-reboot To: Harry Schmalzbauer Cc: freebsd-scsi@freebsd.org, Scott Long Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 18:12:39 -0000 > -----Original Message----- > From: Harry Schmalzbauer [mailto:freebsd@omnilan.de] > Sent: Thursday, June 01, 2017 12:03 PM > To: Stephen Mcconnell > Cc: freebsd-scsi@freebsd.org; Scott Long > Subject: Re: mps(4) blocks panic-reboot > > Bez=C3=BCglich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 (localt= ime): > > Can you try the attached patch and let me know how it goes? I didn't > > test it, but since you know how, it might be easier this way. This was > > diff'd from the latest mps files in stable/11, which I recently updated > > (today). > > Thanks a lot, I noticed the highly appreciated MFC! > Things are cooking... There were sysdecode userland changes, so I need to > buidl > world also, before my rollout system provides the update for this > machine =E2=80=93 will > be ready in an hour. > > Since I have expert's attention, I'd like to ask a another mps(4) related > question: > > I had unionfs deadlocks. (I'm aware of the broken status of unionfs, and > since > I'm not able to fix it myself at the moment, I already replaced it with > nullfs > where possible, true for the following event) > > Since this machine has a memory-disk as rootfs (and 5 SSDs via mps(4) for > bootpool and a separate syspool, where /var e.g. lives), I guess the > deadlock is > responsible for simultanious disappearance of all mps(4) attached drives. > > Is that plausable? (meaning, does the mps(4) driver depend on filesystem > subsystem?) > > Or do you have any idea what else could lead to disapearance of all drive= s > simultaniously? Other ata drives, via on-board ahci (C203) were not > affected! > UNfortunately, I haven't been able to record any kernel messages when tha= t > happened (3 times as far as I remember, no occurence since abandoning > unionfs > yet) This doesn't seem like an mps driver problem to me, but maybe someone else here can help more than I can. I can't think of anything that might be causing your drives to disappear. It would help if you could get some kerne= l logs when this happens. > > Thanks, > > -harry From owner-freebsd-scsi@freebsd.org Thu Jun 1 18:30:43 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5CAC3B7BDE2 for ; Thu, 1 Jun 2017 18:30:43 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0DA906DA; Thu, 1 Jun 2017 18:30:42 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v51IUeTo083315; Thu, 1 Jun 2017 20:30:40 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 3E14CF78; Thu, 1 Jun 2017 20:30:40 +0200 (CEST) Message-ID: <59305D4F.40707@omnilan.de> Date: Thu, 01 Jun 2017 20:30:39 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Stephen Mcconnell CC: freebsd-scsi@freebsd.org, Scott Long Subject: Re: sporadic CAM (all devices) outage on 11-stable, mps(4), ahci(4) and bhyve(8) involved. [Was: Re: mps(4) blocks panic-reboot] References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> <593056E9.6000807@omnilan.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Thu, 01 Jun 2017 20:30:40 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 18:30:43 -0000 Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 20:12 (localtime): >> -----Original Message----- >> From: Harry Schmalzbauer [mailto:freebsd@omnilan.de] >> Sent: Thursday, June 01, 2017 12:03 PM >> To: Stephen Mcconnell >> Cc: freebsd-scsi@freebsd.org; Scott Long >> Subject: Re: mps(4) blocks panic-reboot >> >> Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 (localtime): >>> Can you try the attached patch and let me know how it goes? I didn't >>> test it, but since you know how, it might be easier this way. This was >>> diff'd from the latest mps files in stable/11, which I recently updated >>> (today). >> >> Thanks a lot, I noticed the highly appreciated MFC! >> Things are cooking... There were sysdecode userland changes, so I need to >> buidl >> world also, before my rollout system provides the update for this >> machine – will >> be ready in an hour. >> >> Since I have expert's attention, I'd like to ask a another mps(4) related >> question: >> >> I had unionfs deadlocks. (I'm aware of the broken status of unionfs, and >> since >> I'm not able to fix it myself at the moment, I already replaced it with >> nullfs >> where possible, true for the following event) >> >> Since this machine has a memory-disk as rootfs (and 5 SSDs via mps(4) for >> bootpool and a separate syspool, where /var e.g. lives), I guess the >> deadlock is >> responsible for simultanious disappearance of all mps(4) attached drives. >> >> Is that plausable? (meaning, does the mps(4) driver depend on filesystem >> subsystem?) >> >> Or do you have any idea what else could lead to disapearance of all drives >> simultaniously? Other ata drives, via on-board ahci (C203) were not >> affected! >> UNfortunately, I haven't been able to record any kernel messages when that >> happened (3 times as far as I remember, no occurence since abandoning >> unionfs >> yet) > > This doesn't seem like an mps driver problem to me, but maybe someone else > here can help more than I can. I can't think of anything that might be > causing your drives to disappear. It would help if you could get some kernel > logs when this happens. Thanks, I should have searched beforehand... Two lies: At least once there were also SATA drives via ahci(4) affected, and I noted some kernel messages. Please see this post: https://lists.freebsd.org/pipermail/freebsd-scsi/2016-December/007216.html Sorry, thought it was longer ago and not discueesd at scsi@ at all... At that time, there was unionfs involved, which later lead to complete deadlocks on different setups with completely different applications. But I think that (deadlock) is one possible root of problems these setups had in common. So if one expert can tell me – nope, disapearing drives can't be related to (union)fs deadlocks, or the opposite, I'd be deeply grateful. -harry From owner-freebsd-scsi@freebsd.org Thu Jun 1 18:55:08 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 23889B7C66B for ; Thu, 1 Jun 2017 18:55:08 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x236.google.com (mail-it0-x236.google.com [IPv6:2607:f8b0:4001:c0b::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EF1241AD9 for ; Thu, 1 Jun 2017 18:55:07 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x236.google.com with SMTP id r63so41620031itc.1 for ; Thu, 01 Jun 2017 11:55:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:cc:content-transfer-encoding; bh=CJ7jjwohw3eS/ddexYkwX+7DsMd1Jk+FFXG+tR6lHeQ=; b=SmgcwWVbYxvcqMr66T6OXGaG/t1ad04AcMQJmQ++dld/585H48bVZ2ofg22fkDXLA+ kqnHdnP7Z6jxte7gDUsQkZDkZap1ta00K8NNrutuDbyK5f8dPaLJ4A/E5ROM4Cm2Zd4m yqiZTvw8fYc0+hU3nG4LqMGmcX2Wlk4ZFx1Zk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc :content-transfer-encoding; bh=CJ7jjwohw3eS/ddexYkwX+7DsMd1Jk+FFXG+tR6lHeQ=; b=kTPO9jyEGXcyXg8q+BY9bnHGEypMOALXCm7/rWLNWgI7AnlI9G4fo6QOzPcNj8PFv9 B+Olenk5vQPI5/0q30eI/lDHYOVTZkoWF9g4sjRltVfeJGovPGsV9RE5RQ7+A7KUR7+0 uQ24ez/uucbRsWIYfy15HgwOVfpzwacmnbpd+Tl8gYoxhdXnKsejUGbi1zzbz7CnDSL3 KMDplhQi/thKuEbKBQ4VEwDsla3SV7tQvW5gXB4OCoFlxtcApbmMyXtnt2gLLHNoz61B zsMQxh0tNS/OkCelCLbpmxVMJbibkRXoBh7ZAwC8anWp5QWRi/EY03akVrt2H0t81pyD dDYg== X-Gm-Message-State: AODbwcB0LcAMPcF88nCkZ+m557b7VR8BAg4GXh1V1UjCIfNFL8iNoDtC oJ+7L5frsEdILf8N7NRANcH/heUeUg7q X-Received: by 10.36.181.65 with SMTP id j1mr748191iti.55.1496343307272; Thu, 01 Jun 2017 11:55:07 -0700 (PDT) From: Stephen Mcconnell References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> <593056E9.6000807@omnilan.de> <59305D4F.40707@omnilan.de> In-Reply-To: <59305D4F.40707@omnilan.de> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQK5uw9AxlTbZs3SRUL7gsvMDeNX4QK127o/Aqh6HjgB6++7AwGq54UrAuIoCdQCI/KgVp/SwAdg Date: Thu, 1 Jun 2017 12:55:06 -0600 Message-ID: Subject: RE: sporadic CAM (all devices) outage on 11-stable, mps(4), ahci(4) and bhyve(8) involved. [Was: Re: mps(4) blocks panic-reboot] To: Harry Schmalzbauer Cc: freebsd-scsi@freebsd.org, Scott Long Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 18:55:08 -0000 Take a look at PR 212914. Could that be the issue? It was MFC'd to stable/1= 1 with r309273 on Nov 28th, 2016. Steve > -----Original Message----- > From: Harry Schmalzbauer [mailto:freebsd@omnilan.de] > Sent: Thursday, June 01, 2017 12:31 PM > To: Stephen Mcconnell > Cc: freebsd-scsi@freebsd.org; Scott Long > Subject: Re: sporadic CAM (all devices) outage on 11-stable, mps(4), > ahci(4) and > bhyve(8) involved. [Was: Re: mps(4) blocks panic-reboot] > > Bez=C3=BCglich Stephen Mcconnell's Nachricht vom 01.06.2017 20:12 (localt= ime): > >> -----Original Message----- > >> From: Harry Schmalzbauer [mailto:freebsd@omnilan.de] > >> Sent: Thursday, June 01, 2017 12:03 PM > >> To: Stephen Mcconnell > >> Cc: freebsd-scsi@freebsd.org; Scott Long > >> Subject: Re: mps(4) blocks panic-reboot > >> > >> Bez=C3=BCglich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 > >> (localtime): > >>> Can you try the attached patch and let me know how it goes? I didn't > >>> test it, but since you know how, it might be easier this way. This > >>> was diff'd from the latest mps files in stable/11, which I recently > >>> updated (today). > >> > >> Thanks a lot, I noticed the highly appreciated MFC! > >> Things are cooking... There were sysdecode userland changes, so I > >> need to buidl world also, before my rollout system provides the > >> update for this machine =E2=80=93 will be ready in an hour. > >> > >> Since I have expert's attention, I'd like to ask a another mps(4) > >> related > >> question: > >> > >> I had unionfs deadlocks. (I'm aware of the broken status of unionfs, > >> and since I'm not able to fix it myself at the moment, I already > >> replaced it with nullfs where possible, true for the following event) > >> > >> Since this machine has a memory-disk as rootfs (and 5 SSDs via mps(4) > >> for bootpool and a separate syspool, where /var e.g. lives), I guess > >> the deadlock is responsible for simultanious disappearance of all > >> mps(4) attached drives. > >> > >> Is that plausable? (meaning, does the mps(4) driver depend on > >> filesystem > >> subsystem?) > >> > >> Or do you have any idea what else could lead to disapearance of all > >> drives simultaniously? Other ata drives, via on-board ahci (C203) > >> were not affected! > >> UNfortunately, I haven't been able to record any kernel messages when > >> that happened (3 times as far as I remember, no occurence since > >> abandoning unionfs > >> yet) > > > > This doesn't seem like an mps driver problem to me, but maybe someone > > else here can help more than I can. I can't think of anything that > > might be causing your drives to disappear. It would help if you could > > get some kernel logs when this happens. > > Thanks, I should have searched beforehand... Two lies: At least once ther= e > were > also SATA drives via ahci(4) affected, and I noted some kernel messages. > > Please see this post: > https://lists.freebsd.org/pipermail/freebsd-scsi/2016-December/007216.htm= l > > Sorry, thought it was longer ago and not discueesd at scsi@ at all... > > At that time, there was unionfs involved, which later lead to complete > deadlocks > on different setups with completely different applications. > But I think that (deadlock) is one possible root of problems these setups > had in > common. > > So if one expert can tell me =E2=80=93 nope, disapearing drives can't be = related > to > (union)fs deadlocks, or the opposite, I'd be deeply grateful. > > -harry From owner-freebsd-scsi@freebsd.org Thu Jun 1 19:03:35 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EEB6EB7C833 for ; Thu, 1 Jun 2017 19:03:35 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8DAF41E9D; Thu, 1 Jun 2017 19:03:34 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v51J3WB2083607; Thu, 1 Jun 2017 21:03:32 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 39A8DF92; Thu, 1 Jun 2017 21:03:32 +0200 (CEST) Message-ID: <59306503.4010007@omnilan.de> Date: Thu, 01 Jun 2017 21:03:31 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Stephen Mcconnell CC: freebsd-scsi@freebsd.org, Scott Long Subject: Re: mps(4) blocks panic-reboot References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Thu, 01 Jun 2017 21:03:32 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 19:03:36 -0000 Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 (localtime): > Can you try the attached patch and let me know how it goes? I didn't test > it, but since you know how, it might be easier this way. This was diff'd > from the latest mps files in stable/11, which I recently updated (today). Your diff is doing very well on r319447: Fatal trap 12: page fault while in kernel mode … Uptime: 1m26s Dumping 1608 out of 15734 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Dump complete mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 mps0: Completing stop unit for (xpt0:mps0:0:2:ffffffff): mps0: Sending StopUnit: path (xpt0:mps0:0:3:ffffffff): handle 11 mps0: Completing stop unit for (xpt0:mps0:0:3:ffffffff): mps0: Sending StopUnit: path (xpt0:mps0:0:4:ffffffff): handle 10 mps0: Completing stop unit for (xpt0:mps0:0:4:ffffffff): mps0: Sending StopUnit: path (xpt0:mps0:0:5:ffffffff): handle 9 mps0: Completing stop unit for (xpt0:mps0:0:5:ffffffff): mps0: Sending StopUnit: path (xpt0:mps0:0:6:ffffffff): handle 13 mps0: Completing stop unit for (xpt0:mps0:0:6:ffffffff): And, there followed a immediate reset :-) Thank you very much! Fellows who have these great mp[sr] silicon in use but no ipmi-watchdog will get better slepp from now on ;-) -harry From owner-freebsd-scsi@freebsd.org Thu Jun 1 19:06:42 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 905BDB7C8E5 for ; Thu, 1 Jun 2017 19:06:42 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x22f.google.com (mail-it0-x22f.google.com [IPv6:2607:f8b0:4001:c0b::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 57DBF1FD0 for ; Thu, 1 Jun 2017 19:06:42 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x22f.google.com with SMTP id m47so115135iti.1 for ; Thu, 01 Jun 2017 12:06:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:cc:content-transfer-encoding; bh=mNvWNlYtAvlWAIozPfw2l81Lz4tom6YKUQKk4DHCmvg=; b=S5WB5Vc2LQJSjU2azHl6Ip+lMo5A3RGHjAviOIYZdNDM28IkPzEVG4kpulRkf6o4K9 JBp9CNpILY8JiVO5rxfdxcZ5opa8NehBPc9jBG3E1c85Uh3Zze+UG8J7A5pzsQVuxxyg cl8ZVrETJV4QJmlvjHW6gTxX5M1ZSea89clNQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc :content-transfer-encoding; bh=mNvWNlYtAvlWAIozPfw2l81Lz4tom6YKUQKk4DHCmvg=; b=M9vKNB4GJnXrUL0nqNkEnwWQeqAR4FfLM+6k/ofN/yWJVdLNw/efVbGzpzJYJZllO1 SHTpCzYYao5qVlmDPBWRUTSFIBFZQUmf/MUsbU9/VEpr1YGPVrBi7ND4mb55WmuMjJSx WPmdAbaCZ/yBVSGh+7VF33T+EyvwIiVkFpS7KBfK5UoVQMjrnYlU3vjsm6tdIR1fumvD au31X6hJ8H/vBw83gaIfyBQ9MZ2716LV+b0cAdiZOkRGCaLHLQHWUP3BBO0bB5FSEIlL jIvfaUOg0041fdYAcLjE1XAFTBoSYmH0K4XyLs9g2T6alh94tXXBRjHqWH+iDJEnadRg tdkw== X-Gm-Message-State: AODbwcAFWZxGanHnvf5GGjOkwsTtqRVNLd+rrJi1QG08WByjBAFEy+lE 8kbSDITbpUZUWp9tn4+tzPbcQrF2gXSL X-Received: by 10.36.181.65 with SMTP id j1mr808104iti.55.1496344001501; Thu, 01 Jun 2017 12:06:41 -0700 (PDT) From: Stephen Mcconnell References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> In-Reply-To: <59306503.4010007@omnilan.de> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQK5uw9AxlTbZs3SRUL7gsvMDeNX4QK127o/Aqh6HjgB6++7AwFsXC9Mn/zo+NA= Date: Thu, 1 Jun 2017 13:06:40 -0600 Message-ID: <3240277b325d5197127c20ac795149b2@mail.gmail.com> Subject: RE: mps(4) blocks panic-reboot To: Harry Schmalzbauer Cc: freebsd-scsi@freebsd.org, Scott Long Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 19:06:42 -0000 > -----Original Message----- > From: Harry Schmalzbauer [mailto:freebsd@omnilan.de] > Sent: Thursday, June 01, 2017 1:04 PM > To: Stephen Mcconnell > Cc: freebsd-scsi@freebsd.org; Scott Long > Subject: Re: mps(4) blocks panic-reboot > > Bez=C3=BCglich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 (localt= ime): > > Can you try the attached patch and let me know how it goes? I didn't > > test it, but since you know how, it might be easier this way. This was > > diff'd from the latest mps files in stable/11, which I recently updated > > (today). > > Your diff is doing very well on r319447: > > Fatal trap 12: page fault while in kernel mode =E2=80=A6 > Uptime: 1m26s > Dumping 1608 out of 15734 > MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% > Dump complete > mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 > mps0: Completing stop unit for (xpt0:mps0:0:2:ffffffff): > mps0: Sending StopUnit: path (xpt0:mps0:0:3:ffffffff): handle 11 > mps0: Completing stop unit for (xpt0:mps0:0:3:ffffffff): > mps0: Sending StopUnit: path (xpt0:mps0:0:4:ffffffff): handle 10 > mps0: Completing stop unit for (xpt0:mps0:0:4:ffffffff): > mps0: Sending StopUnit: path (xpt0:mps0:0:5:ffffffff): handle 9 > mps0: Completing stop unit for (xpt0:mps0:0:5:ffffffff): > mps0: Sending StopUnit: path (xpt0:mps0:0:6:ffffffff): handle 13 > mps0: Completing stop unit for (xpt0:mps0:0:6:ffffffff): > > And, there followed a immediate reset :-) > > Thank you very much! Fellows who have these great mp[sr] silicon in use > but no > ipmi-watchdog will get better slepp from now on ;-) Great! Thanks for the test. > > -harry From owner-freebsd-scsi@freebsd.org Thu Jun 1 19:10:15 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 62F65B7C9A5 for ; Thu, 1 Jun 2017 19:10:15 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F3A072162; Thu, 1 Jun 2017 19:10:14 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [IPv6:2a00:e10:2800::a135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v51JACMU083661; Thu, 1 Jun 2017 21:10:12 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 434FFF9A; Thu, 1 Jun 2017 21:10:12 +0200 (CEST) Message-ID: <59306693.6080304@omnilan.de> Date: Thu, 01 Jun 2017 21:10:11 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Stephen Mcconnell CC: freebsd-scsi@freebsd.org, Scott Long Subject: Re: sporadic CAM (all devices) outage on 11-stable, mps(4), ahci(4) and bhyve(8) involved. [Was: Re: mps(4) blocks panic-reboot] References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> <593056E9.6000807@omnilan.de> <59305D4F.40707@omnilan.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]); Thu, 01 Jun 2017 21:10:12 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: ; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 19:10:15 -0000 Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 20:55 (localtime): > Take a look at PR 212914. Could that be the issue? It was MFC'd to stable/11 > with r309273 on Nov 28th, 2016. Thanks a lot, but that's unrelated. The "readded" phenomen wasn't true and last occurance of that simultanious drives disappearance was 3 weeks ago (on r317xxx). But it was before I completely got rid of unionfs usage. Since I use(d) a panic-workarround-patch for unionfs, these deadlocks could happen and I hoped that could be to blame for the quoted symptom, for which I couldn't imagine any other cause yet. Thanks, -harry From owner-freebsd-scsi@freebsd.org Fri Jun 2 12:30:48 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 87966BF7092 for ; Fri, 2 Jun 2017 12:30:48 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2D86C8F8; Fri, 2 Jun 2017 12:30:48 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v52CUjhk096136; Fri, 2 Jun 2017 14:30:45 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 12BF2203; Fri, 2 Jun 2017 14:30:45 +0200 (CEST) Message-ID: <59315A74.9050506@omnilan.de> Date: Fri, 02 Jun 2017 14:30:44 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: Stephen Mcconnell CC: freebsd-scsi@freebsd.org, Scott Long Subject: Re: mps(4) blocks panic-reboot References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> In-Reply-To: <59306503.4010007@omnilan.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Fri, 02 Jun 2017 14:30:45 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 12:30:48 -0000 Bezüglich Harry Schmalzbauer's Nachricht vom 01.06.2017 21:03 (localtime): > Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 (localtime): >> Can you try the attached patch and let me know how it goes? I didn't test >> it, but since you know how, it might be easier this way. This was diff'd >> from the latest mps files in stable/11, which I recently updated (today). > Your diff is doing very well on r319447: > > … > mps0: Sending StopUnit: path (xpt0:mps0:0:6:ffffffff): handle 13 > mps0: Completing stop unit for (xpt0:mps0:0:6:ffffffff): > > And, there followed a immediate reset :-) There's one new problem: Shutting down leads to the probably last panic possible: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff805f43ec stack pointer = 0x28:0xfffffe03bc9c3730 frame pointer = 0x28:0xfffffe03bc9c3750 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 1 (init) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff805df4f7 at kdb_backtrace+0x67 #1 0xffffffff8059df96 at vpanic+0x186 #2 0xffffffff8059de03 at panic+0x43 #3 0xffffffff808a1892 at trap_fatal+0x322 #4 0xffffffff808a18e9 at trap_pfault+0x49 #5 0xffffffff808a1126 at trap+0x286 #6 0xffffffff80887401 at calltrap+0x8 #7 0xffffffff805800f2 at __mtx_unlock_sleep+0x72 #8 0xffffffff8029a7dc at xpt_polled_action+0x31c #9 0xffffffff80416c2b at mpssas_ir_shutdown+0x51b #10 0xffffffff8059db9a at kern_reboot+0x49a #11 0xffffffff8059d6f8 at sys_reboot+0x458 #12 0xffffffff808a23f4 at amd64_syscall+0x6c4 #13 0xffffffff808876eb at Xfast_syscall+0xfb (kgdb) list *0xffffffff805f43ec 0xffffffff805f43ec is in turnstile_broadcast (/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_turnstile.c:837). 832 833 /* 834 * Transfer the blocked list to the pending list. 835 */ 836 mtx_lock_spin(&td_contested_lock); 837 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], td_lockq); 838 mtx_unlock_spin(&td_contested_lock); 839 840 /* 841 * Give a turnstile to each thread. The last thread gets I haven't looked at the code at all and only very briefly lokked at the diff, just out of curiosity, like pigs staring at clockworks ;-) But at least I hope this report does help. Thanks, -harry From owner-freebsd-scsi@freebsd.org Fri Jun 2 15:37:15 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 20130BF9CED for ; Fri, 2 Jun 2017 15:37:15 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E76626647C; Fri, 2 Jun 2017 15:37:14 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (localhost [127.0.0.1]) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPS id v52Fb52E056317 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 2 Jun 2017 11:37:05 -0400 (EDT) (envelope-from ken@mithlond.kdm.org) Received: (from ken@localhost) by mithlond.kdm.org (8.15.2/8.14.9/Submit) id v52Fb57Q056316; Fri, 2 Jun 2017 11:37:05 -0400 (EDT) (envelope-from ken) Date: Fri, 2 Jun 2017 11:37:05 -0400 From: "Kenneth D. Merry" To: Harry Schmalzbauer Cc: Stephen Mcconnell , freebsd-scsi@freebsd.org, Scott Long Subject: Re: mps(4) blocks panic-reboot Message-ID: <20170602153705.GA56018@mithlond.kdm.org> References: <592FDE8C.1090609@omnilan.de> <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> <59315A74.9050506@omnilan.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59315A74.9050506@omnilan.de> User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [127.0.0.1]); Fri, 02 Jun 2017 11:37:06 -0400 (EDT) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mithlond.kdm.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 15:37:15 -0000 On Fri, Jun 02, 2017 at 14:30:44 +0200, Harry Schmalzbauer wrote: > Bez??glich Harry Schmalzbauer's Nachricht vom 01.06.2017 21:03 (localtime): > > Bez??glich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 (localtime): > >> Can you try the attached patch and let me know how it goes? I didn't test > >> it, but since you know how, it might be easier this way. This was diff'd > >> from the latest mps files in stable/11, which I recently updated (today). > > Your diff is doing very well on r319447: > > > > > ??? > > mps0: Sending StopUnit: path (xpt0:mps0:0:6:ffffffff): handle 13 > > mps0: Completing stop unit for (xpt0:mps0:0:6:ffffffff): > > > > And, there followed a immediate reset :-) > > There's one new problem: Shutting down leads to the probably last panic > possible: > > kernel trap 12 with interrupts disabled > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x20 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff805f43ec > stack pointer = 0x28:0xfffffe03bc9c3730 > frame pointer = 0x28:0xfffffe03bc9c3750 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = resume, IOPL = 0 > current process = 1 (init) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff805df4f7 at kdb_backtrace+0x67 > #1 0xffffffff8059df96 at vpanic+0x186 > #2 0xffffffff8059de03 at panic+0x43 > #3 0xffffffff808a1892 at trap_fatal+0x322 > #4 0xffffffff808a18e9 at trap_pfault+0x49 > #5 0xffffffff808a1126 at trap+0x286 > #6 0xffffffff80887401 at calltrap+0x8 > #7 0xffffffff805800f2 at __mtx_unlock_sleep+0x72 > #8 0xffffffff8029a7dc at xpt_polled_action+0x31c > #9 0xffffffff80416c2b at mpssas_ir_shutdown+0x51b > #10 0xffffffff8059db9a at kern_reboot+0x49a > #11 0xffffffff8059d6f8 at sys_reboot+0x458 > #12 0xffffffff808a23f4 at amd64_syscall+0x6c4 > #13 0xffffffff808876eb at Xfast_syscall+0xfb > > (kgdb) list *0xffffffff805f43ec > 0xffffffff805f43ec is in turnstile_broadcast > (/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_turnstile.c:837). > 832 > 833 /* > 834 * Transfer the blocked list to the pending list. > 835 */ > 836 mtx_lock_spin(&td_contested_lock); > 837 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], > td_lockq); > 838 mtx_unlock_spin(&td_contested_lock); > 839 > 840 /* > 841 * Give a turnstile to each thread. The last thread gets > > I haven't looked at the code at all and only very briefly lokked at the > diff, just out of curiosity, like pigs staring at clockworks ;-) > > But at least I hope this report does help. Thanks for testing it! My guess is that the problem is that the problem is xpt_polled_action() releases the device mutex, but mpssas_SSU_to_SATA_devices() isn't acquiring the mutex. You could try putting the following around the call to xpt_polled_action(): mtx_lock(xpt_path_mtx(ccb->ccb_h.path)); xpt_polled_action(ccb); mtx_unlock(xpt_path_mtx(ccb->ccb_h.path)); See if that fixes things. One other thing to put in there -- after the if (target->stop_at_shutdown) { } statement, but still inside the for loop, add these two lines: xpt_free_path(ccb->ccb_h.path); xpt_free_ccb(ccb); Ken -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-scsi@freebsd.org Fri Jun 2 16:56:45 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7FF21BFB4E9 for ; Fri, 2 Jun 2017 16:56:45 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1AD6E6AE12; Fri, 2 Jun 2017 16:56:38 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v52GuapL098733; Fri, 2 Jun 2017 18:56:36 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id A608C25B; Fri, 2 Jun 2017 18:56:35 +0200 (CEST) Message-ID: <593198C3.2080902@omnilan.de> Date: Fri, 02 Jun 2017 18:56:35 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: "Kenneth D. Merry" CC: Stephen Mcconnell , freebsd-scsi@FreeBSD.ORG, Scott Long Subject: Re: mps(4) blocks panic-reboot References: <592FDE8C.1090609@omnilan.de> <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> <59315A74.9050506@omnilan.de> <20170602153705.GA56018@mithlond.kdm.org> In-Reply-To: <20170602153705.GA56018@mithlond.kdm.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Fri, 02 Jun 2017 18:56:36 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 16:56:45 -0000 Bezüglich Kenneth D. Merry's Nachricht vom 02.06.2017 17:37 (localtime): > On Fri, Jun 02, 2017 at 14:30:44 +0200, Harry Schmalzbauer wrote: … >> KDB: stack backtrace: >> #0 0xffffffff805df4f7 at kdb_backtrace+0x67 >> #1 0xffffffff8059df96 at vpanic+0x186 >> #2 0xffffffff8059de03 at panic+0x43 >> #3 0xffffffff808a1892 at trap_fatal+0x322 >> #4 0xffffffff808a18e9 at trap_pfault+0x49 >> #5 0xffffffff808a1126 at trap+0x286 >> #6 0xffffffff80887401 at calltrap+0x8 >> #7 0xffffffff805800f2 at __mtx_unlock_sleep+0x72 >> #8 0xffffffff8029a7dc at xpt_polled_action+0x31c >> #9 0xffffffff80416c2b at mpssas_ir_shutdown+0x51b >> #10 0xffffffff8059db9a at kern_reboot+0x49a >> #11 0xffffffff8059d6f8 at sys_reboot+0x458 >> #12 0xffffffff808a23f4 at amd64_syscall+0x6c4 >> #13 0xffffffff808876eb at Xfast_syscall+0xfb >> >> (kgdb) list *0xffffffff805f43ec >> 0xffffffff805f43ec is in turnstile_broadcast >> (/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_turnstile.c:837). >> 832 >> 833 /* >> 834 * Transfer the blocked list to the pending list. >> 835 */ >> 836 mtx_lock_spin(&td_contested_lock); >> 837 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], >> td_lockq); >> 838 mtx_unlock_spin(&td_contested_lock); >> 839 >> 840 /* >> 841 * Give a turnstile to each thread. The last thread gets >> >> I haven't looked at the code at all and only very briefly lokked at the >> diff, just out of curiosity, like pigs staring at clockworks ;-) >> >> But at least I hope this report does help. > > Thanks for testing it! > > My guess is that the problem is that the problem is xpt_polled_action() > releases the device mutex, but mpssas_SSU_to_SATA_devices() isn't acquiring > the mutex. > > You could try putting the following around the call to xpt_polled_action(): > > mtx_lock(xpt_path_mtx(ccb->ccb_h.path)); > xpt_polled_action(ccb); > mtx_unlock(xpt_path_mtx(ccb->ccb_h.path)); > > See if that fixes things. One other thing to put in there -- after the > if (target->stop_at_shutdown) { } statement, but still inside the for > loop, add these two lines: > > xpt_free_path(ccb->ccb_h.path); > xpt_free_ccb(ccb); Jope I didn't mess up with text editing, pleas see the attached hunk if it corresponds to the (additional) chages to Stephen's diff. This leads to a series of panics?!? (was very quick after the dump of the first panic was written) ums1: detached mps0: Sending StopUnit: path (xpt0:mps0:0:2:ffffffff): handle 12 mps0: Completing stop unit for (xpt0:mps0:0:2:ffffffff): Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x478 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff80416cca stack pointer = 0x28:0xfffffe03bc9c37f0 frame pointer = 0x28:0xfffffe03bc9c3880 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1 (init) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff805df5c7 at kdb_backtrace+0x67 #1 0xffffffff8059e066 at vpanic+0x186 #2 0xffffffff8059ded3 at panic+0x43 #3 0xffffffff808a1962 at trap_fatal+0x322 #4 0xffffffff808a19b9 at trap_pfault+0x49 #5 0xffffffff808a11f6 at trap+0x286 #6 0xffffffff808874d1 at calltrap+0x8 #7 0xffffffff8059dc6a at kern_reboot+0x49a #8 0xffffffff8059d7c8 at sys_reboot+0x458 #9 0xffffffff808a24c4 at amd64_syscall+0x6c4 #10 0xffffffff808877bb at Xfast_syscall+0xfb Uptime: 1m15s (da0:mps0:0:2:0): Synchronize cache failed Dumping 1277 out of 15734 … #0 doadump (textdump=) at pcpu.h:222 222 pcpu.h: No such file or directory. in pcpu.h (kgdb) list *0xffffffff80416cca 0xffffffff80416cca is in mpssas_ir_shutdown (atomic.h:188). 183 atomic.h: No such file or directory. in atomic.h Should I reduce compiler optimization? Thanks, -harry From owner-freebsd-scsi@freebsd.org Fri Jun 2 16:58:02 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD9F3BFB54D for ; Fri, 2 Jun 2017 16:58:02 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5756A6AE77; Fri, 2 Jun 2017 16:58:02 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (ezra.dcm1.omnilan.net [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id v52GvxqR098743; Fri, 2 Jun 2017 18:57:59 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (titan.inop.mo1.omnilan.net [IPv6:2001:a60:f0bb:1::3:1]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 6EB8925E; Fri, 2 Jun 2017 18:57:59 +0200 (CEST) Message-ID: <59319917.1050301@omnilan.de> Date: Fri, 02 Jun 2017 18:57:59 +0200 From: Harry Schmalzbauer Organization: OmniLAN User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; de-DE; rv:1.9.2.8) Gecko/20100906 Lightning/1.0b2 Thunderbird/3.1.2 MIME-Version: 1.0 To: "Kenneth D. Merry" CC: Stephen Mcconnell , freebsd-scsi@FreeBSD.ORG, Scott Long Subject: Re: mps(4) blocks panic-reboot References: <592FDE8C.1090609@omnilan.de> <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> <59315A74.9050506@omnilan.de> <20170602153705.GA56018@mithlond.kdm.org> <593198C3.2080902@omnilan.de> In-Reply-To: <593198C3.2080902@omnilan.de> Content-Type: multipart/mixed; boundary="------------080306030606030400020901" X-Greylist: ACL 129 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Fri, 02 Jun 2017 18:58:00 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 16:58:02 -0000 This is a multi-part message in MIME format. --------------080306030606030400020901 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Bezüglich Harry Schmalzbauer's Nachricht vom 02.06.2017 18:56 (localtime): > Bezüglich Kenneth D. Merry's Nachricht vom 02.06.2017 17:37 (localtime): >> On Fri, Jun 02, 2017 at 14:30:44 +0200, Harry Schmalzbauer wrote: > … >>> KDB: stack backtrace: >>> #0 0xffffffff805df4f7 at kdb_backtrace+0x67 >>> #1 0xffffffff8059df96 at vpanic+0x186 >>> #2 0xffffffff8059de03 at panic+0x43 >>> #3 0xffffffff808a1892 at trap_fatal+0x322 >>> #4 0xffffffff808a18e9 at trap_pfault+0x49 >>> #5 0xffffffff808a1126 at trap+0x286 >>> #6 0xffffffff80887401 at calltrap+0x8 >>> #7 0xffffffff805800f2 at __mtx_unlock_sleep+0x72 >>> #8 0xffffffff8029a7dc at xpt_polled_action+0x31c >>> #9 0xffffffff80416c2b at mpssas_ir_shutdown+0x51b >>> #10 0xffffffff8059db9a at kern_reboot+0x49a >>> #11 0xffffffff8059d6f8 at sys_reboot+0x458 >>> #12 0xffffffff808a23f4 at amd64_syscall+0x6c4 >>> #13 0xffffffff808876eb at Xfast_syscall+0xfb >>> >>> (kgdb) list *0xffffffff805f43ec >>> 0xffffffff805f43ec is in turnstile_broadcast >>> (/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/subr_turnstile.c:837). >>> 832 >>> 833 /* >>> 834 * Transfer the blocked list to the pending list. >>> 835 */ >>> 836 mtx_lock_spin(&td_contested_lock); >>> 837 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], >>> td_lockq); >>> 838 mtx_unlock_spin(&td_contested_lock); >>> 839 >>> 840 /* >>> 841 * Give a turnstile to each thread. The last thread gets >>> >>> I haven't looked at the code at all and only very briefly lokked at the >>> diff, just out of curiosity, like pigs staring at clockworks ;-) >>> >>> But at least I hope this report does help. >> Thanks for testing it! >> >> My guess is that the problem is that the problem is xpt_polled_action() >> releases the device mutex, but mpssas_SSU_to_SATA_devices() isn't acquiring >> the mutex. >> >> You could try putting the following around the call to xpt_polled_action(): >> >> mtx_lock(xpt_path_mtx(ccb->ccb_h.path)); >> xpt_polled_action(ccb); >> mtx_unlock(xpt_path_mtx(ccb->ccb_h.path)); >> >> See if that fixes things. One other thing to put in there -- after the >> if (target->stop_at_shutdown) { } statement, but still inside the for >> loop, add these two lines: >> >> xpt_free_path(ccb->ccb_h.path); >> xpt_free_ccb(ccb); > > Jope I didn't mess up with text editing, pleas see the attached hunk if > it corresponds to the (additional) chages to Stephen's diff. Sorry, now really with attachment... --------------080306030606030400020901 Content-Type: text/plain; name="mps_sas_lsi.c.kdmdiffpart" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mps_sas_lsi.c.kdmdiffpart" --- mps_sas_lsi.c.orig 2017-06-01 19:39:48.535697000 +0200 +++ mps_sas_lsi.c 2017-06-02 18:10:15.659582000 +0200 @@ -1175,26 +1172,12 @@ /*immediate*/FALSE, MPS_SENSE_LEN, /*timeout*/10000); - xpt_action(ccb); - } - } - - /* - * Wait until all of the SSU commands have completed or time has - * expired (60 seconds). Pause for 100ms each time through. If any - * command times out, the target will be reset in the SCSI command - * timeout routine. - */ - getmicrotime(&start_time); - while (sc->SSU_refcount) { - pause("mpswait", hz/10); - - getmicrotime(&cur_time); - if ((cur_time.tv_sec - start_time.tv_sec) > 60) { - mps_dprint(sc, MPS_FAULT, "Time has expired waiting " - "for SSU commands to complete.\n"); - break; + mtx_lock(xpt_path_mtx(ccb->ccb_h.path)); + xpt_polled_action(ccb); + mtx_unlock(xpt_path_mtx(ccb->ccb_h.path)); } + xpt_free_path(ccb->ccb_h.path); + xpt_free_ccb(ccb); } } --------------080306030606030400020901-- From owner-freebsd-scsi@freebsd.org Fri Jun 2 17:13:33 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C10E5BFB9B4 for ; Fri, 2 Jun 2017 17:13:33 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x234.google.com (mail-it0-x234.google.com [IPv6:2607:f8b0:4001:c0b::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 480546E518 for ; Fri, 2 Jun 2017 17:13:33 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x234.google.com with SMTP id m47so24457677iti.0 for ; Fri, 02 Jun 2017 10:13:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:cc:content-transfer-encoding; bh=vJS3QhRxfN4wT97HS+WcmKIRr693hp/xUijSidn6lOw=; b=bKZ6gGLLBCXOawDIvewP8NqNwzIPOME8UYW91OegBWNWOCpO8NKxtI4jBkxotfuLjf VgpTwHrVvMtn4Op4kRtQapqOsVjCetElJEiGoNjeWrAnHfcgBxM8RPLsl1bbarremLWe cPiEwG4dIK1hmZ4X2D3j4h1x5H5sSIYf5BMC8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc :content-transfer-encoding; bh=vJS3QhRxfN4wT97HS+WcmKIRr693hp/xUijSidn6lOw=; b=O+yqu59mL9SFvAfdgVGfJpBUcoNpFonGbtYm3/VA2B65TEGPQsXSEZ0kZUzy/mwGnG 59bavFxR3Hl4K4XjnYL0bb5GvNAYUg3QEkJflSAIrHXB6M7Mys2bXv3I7l4s5gcAZL2T p6VwMjh6q8CelViovnPtU3hud9m1zFh/jAsq7UQx+O4yKZgQXJ34dgvW05crps7JG1dB Nfe47fbtBZX8jjg+i+St3IU5oQHKOn5nG/+axEaJyN0DGYDrYPyufQc8arunusfg4ipI IVpFX5UtQGMNlLEy3yUoWR5AJDVvoTKRoFntJzNmD1QdiR7z+UblAXOVhFDnaX96/bl0 wlgw== X-Gm-Message-State: AODbwcBi5lZDvAes7UCUQEZZqrMLVbbgDhaEuL8g7Eep4eYChcIkYdjO +ln0XnD9yDa+BUYBxsrsBY6Yx4ZtzVU2 X-Received: by 10.107.181.68 with SMTP id e65mr10598044iof.156.1496423612543; Fri, 02 Jun 2017 10:13:32 -0700 (PDT) From: Stephen Mcconnell References: <592FDE8C.1090609@omnilan.de> <59303484.1040609@omnilan.de> <59306503.4010007@omnilan.de> <59315A74.9050506@omnilan.de> <20170602153705.GA56018@mithlond.kdm.org> <593198C3.2080902@omnilan.de> <59319917.1050301@omnilan.de> In-Reply-To: <59319917.1050301@omnilan.de> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQK5uw9AxlTbZs3SRUL7gsvMDeNX4QK127o/Aqh6HjgB6++7AwFsXC9MAdL6oGEBmtNDxAIdfU1yAZuZqeyfxSSoQA== Date: Fri, 2 Jun 2017 11:13:31 -0600 Message-ID: <86a38661813a20d3b349920c2de8962e@mail.gmail.com> Subject: RE: mps(4) blocks panic-reboot To: Harry Schmalzbauer , "Kenneth D. Merry" Cc: freebsd-scsi@freebsd.org, Scott Long Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jun 2017 17:13:33 -0000 Thanks Harry. I'll need to do some testing here to see if I can figure it out. Steve > -----Original Message----- > From: Harry Schmalzbauer [mailto:freebsd@omnilan.de] > Sent: Friday, June 02, 2017 10:58 AM > To: Kenneth D. Merry > Cc: Stephen Mcconnell; freebsd-scsi@FreeBSD.ORG; Scott Long > Subject: Re: mps(4) blocks panic-reboot > > Bez=C3=BCglich Harry Schmalzbauer's Nachricht vom 02.06.2017 18:56 (localtime): > > Bez=C3=BCglich Kenneth D. Merry's Nachricht vom 02.06.2017 17:37 (localtime): > >> On Fri, Jun 02, 2017 at 14:30:44 +0200, Harry Schmalzbauer wrote: > > =E2=80=A6 > >>> KDB: stack backtrace: > >>> #0 0xffffffff805df4f7 at kdb_backtrace+0x67 > >>> #1 0xffffffff8059df96 at vpanic+0x186 > >>> #2 0xffffffff8059de03 at panic+0x43 > >>> #3 0xffffffff808a1892 at trap_fatal+0x322 > >>> #4 0xffffffff808a18e9 at trap_pfault+0x49 > >>> #5 0xffffffff808a1126 at trap+0x286 > >>> #6 0xffffffff80887401 at calltrap+0x8 > >>> #7 0xffffffff805800f2 at __mtx_unlock_sleep+0x72 > >>> #8 0xffffffff8029a7dc at xpt_polled_action+0x31c > >>> #9 0xffffffff80416c2b at mpssas_ir_shutdown+0x51b > >>> #10 0xffffffff8059db9a at kern_reboot+0x49a > >>> #11 0xffffffff8059d6f8 at sys_reboot+0x458 > >>> #12 0xffffffff808a23f4 at amd64_syscall+0x6c4 > >>> #13 0xffffffff808876eb at Xfast_syscall+0xfb > >>> > >>> (kgdb) list *0xffffffff805f43ec > >>> 0xffffffff805f43ec is in turnstile_broadcast > >>> (/usr/local/share/deploy- > tools/RELENG_11/src/sys/kern/subr_turnstile.c:837). > >>> 832 > >>> 833 /* > >>> 834 * Transfer the blocked list to the pending list. > >>> 835 */ > >>> 836 mtx_lock_spin(&td_contested_lock); > >>> 837 TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue], > >>> td_lockq); > >>> 838 mtx_unlock_spin(&td_contested_lock); > >>> 839 > >>> 840 /* > >>> 841 * Give a turnstile to each thread. The last thread gets > >>> > >>> I haven't looked at the code at all and only very briefly lokked at > >>> the diff, just out of curiosity, like pigs staring at clockworks ;-) > >>> > >>> But at least I hope this report does help. > >> Thanks for testing it! > >> > >> My guess is that the problem is that the problem is > >> xpt_polled_action() releases the device mutex, but > >> mpssas_SSU_to_SATA_devices() isn't acquiring the mutex. > >> > >> You could try putting the following around the call to xpt_polled_action(): > >> > >> mtx_lock(xpt_path_mtx(ccb->ccb_h.path)); > >> xpt_polled_action(ccb); > >> mtx_unlock(xpt_path_mtx(ccb->ccb_h.path)); > >> > >> See if that fixes things. One other thing to put in there -- after > >> the if (target->stop_at_shutdown) { } statement, but still inside the > >> for loop, add these two lines: > >> > >> xpt_free_path(ccb->ccb_h.path); > >> xpt_free_ccb(ccb); > > > > Jope I didn't mess up with text editing, pleas see the attached hunk > > if it corresponds to the (additional) chages to Stephen's diff. > > Sorry, now really with attachment...