From owner-freebsd-scsi@freebsd.org Thu Jun 1 18:55:08 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 23889B7C66B for ; Thu, 1 Jun 2017 18:55:08 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: from mail-it0-x236.google.com (mail-it0-x236.google.com [IPv6:2607:f8b0:4001:c0b::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EF1241AD9 for ; Thu, 1 Jun 2017 18:55:07 +0000 (UTC) (envelope-from stephen.mcconnell@broadcom.com) Received: by mail-it0-x236.google.com with SMTP id r63so41620031itc.1 for ; Thu, 01 Jun 2017 11:55:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=from:references:in-reply-to:mime-version:thread-index:date :message-id:subject:to:cc:content-transfer-encoding; bh=CJ7jjwohw3eS/ddexYkwX+7DsMd1Jk+FFXG+tR6lHeQ=; b=SmgcwWVbYxvcqMr66T6OXGaG/t1ad04AcMQJmQ++dld/585H48bVZ2ofg22fkDXLA+ kqnHdnP7Z6jxte7gDUsQkZDkZap1ta00K8NNrutuDbyK5f8dPaLJ4A/E5ROM4Cm2Zd4m yqiZTvw8fYc0+hU3nG4LqMGmcX2Wlk4ZFx1Zk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:references:in-reply-to:mime-version :thread-index:date:message-id:subject:to:cc :content-transfer-encoding; bh=CJ7jjwohw3eS/ddexYkwX+7DsMd1Jk+FFXG+tR6lHeQ=; b=kTPO9jyEGXcyXg8q+BY9bnHGEypMOALXCm7/rWLNWgI7AnlI9G4fo6QOzPcNj8PFv9 B+Olenk5vQPI5/0q30eI/lDHYOVTZkoWF9g4sjRltVfeJGovPGsV9RE5RQ7+A7KUR7+0 uQ24ez/uucbRsWIYfy15HgwOVfpzwacmnbpd+Tl8gYoxhdXnKsejUGbi1zzbz7CnDSL3 KMDplhQi/thKuEbKBQ4VEwDsla3SV7tQvW5gXB4OCoFlxtcApbmMyXtnt2gLLHNoz61B zsMQxh0tNS/OkCelCLbpmxVMJbibkRXoBh7ZAwC8anWp5QWRi/EY03akVrt2H0t81pyD dDYg== X-Gm-Message-State: AODbwcB0LcAMPcF88nCkZ+m557b7VR8BAg4GXh1V1UjCIfNFL8iNoDtC oJ+7L5frsEdILf8N7NRANcH/heUeUg7q X-Received: by 10.36.181.65 with SMTP id j1mr748191iti.55.1496343307272; Thu, 01 Jun 2017 11:55:07 -0700 (PDT) From: Stephen Mcconnell References: <592FDE8C.1090609@omnilan.de> 12a36df9eff99c77ec621987efbe75fe@mail.gmail.com <59303484.1040609@omnilan.de> <593056E9.6000807@omnilan.de> <59305D4F.40707@omnilan.de> In-Reply-To: <59305D4F.40707@omnilan.de> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQK5uw9AxlTbZs3SRUL7gsvMDeNX4QK127o/Aqh6HjgB6++7AwGq54UrAuIoCdQCI/KgVp/SwAdg Date: Thu, 1 Jun 2017 12:55:06 -0600 Message-ID: Subject: RE: sporadic CAM (all devices) outage on 11-stable, mps(4), ahci(4) and bhyve(8) involved. [Was: Re: mps(4) blocks panic-reboot] To: Harry Schmalzbauer Cc: freebsd-scsi@freebsd.org, Scott Long Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Jun 2017 18:55:08 -0000 Take a look at PR 212914. Could that be the issue? It was MFC'd to stable/1= 1 with r309273 on Nov 28th, 2016. Steve > -----Original Message----- > From: Harry Schmalzbauer [mailto:freebsd@omnilan.de] > Sent: Thursday, June 01, 2017 12:31 PM > To: Stephen Mcconnell > Cc: freebsd-scsi@freebsd.org; Scott Long > Subject: Re: sporadic CAM (all devices) outage on 11-stable, mps(4), > ahci(4) and > bhyve(8) involved. [Was: Re: mps(4) blocks panic-reboot] > > Bez=C3=BCglich Stephen Mcconnell's Nachricht vom 01.06.2017 20:12 (localt= ime): > >> -----Original Message----- > >> From: Harry Schmalzbauer [mailto:freebsd@omnilan.de] > >> Sent: Thursday, June 01, 2017 12:03 PM > >> To: Stephen Mcconnell > >> Cc: freebsd-scsi@freebsd.org; Scott Long > >> Subject: Re: mps(4) blocks panic-reboot > >> > >> Bez=C3=BCglich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36 > >> (localtime): > >>> Can you try the attached patch and let me know how it goes? I didn't > >>> test it, but since you know how, it might be easier this way. This > >>> was diff'd from the latest mps files in stable/11, which I recently > >>> updated (today). > >> > >> Thanks a lot, I noticed the highly appreciated MFC! > >> Things are cooking... There were sysdecode userland changes, so I > >> need to buidl world also, before my rollout system provides the > >> update for this machine =E2=80=93 will be ready in an hour. > >> > >> Since I have expert's attention, I'd like to ask a another mps(4) > >> related > >> question: > >> > >> I had unionfs deadlocks. (I'm aware of the broken status of unionfs, > >> and since I'm not able to fix it myself at the moment, I already > >> replaced it with nullfs where possible, true for the following event) > >> > >> Since this machine has a memory-disk as rootfs (and 5 SSDs via mps(4) > >> for bootpool and a separate syspool, where /var e.g. lives), I guess > >> the deadlock is responsible for simultanious disappearance of all > >> mps(4) attached drives. > >> > >> Is that plausable? (meaning, does the mps(4) driver depend on > >> filesystem > >> subsystem?) > >> > >> Or do you have any idea what else could lead to disapearance of all > >> drives simultaniously? Other ata drives, via on-board ahci (C203) > >> were not affected! > >> UNfortunately, I haven't been able to record any kernel messages when > >> that happened (3 times as far as I remember, no occurence since > >> abandoning unionfs > >> yet) > > > > This doesn't seem like an mps driver problem to me, but maybe someone > > else here can help more than I can. I can't think of anything that > > might be causing your drives to disappear. It would help if you could > > get some kernel logs when this happens. > > Thanks, I should have searched beforehand... Two lies: At least once ther= e > were > also SATA drives via ahci(4) affected, and I noted some kernel messages. > > Please see this post: > https://lists.freebsd.org/pipermail/freebsd-scsi/2016-December/007216.htm= l > > Sorry, thought it was longer ago and not discueesd at scsi@ at all... > > At that time, there was unionfs involved, which later lead to complete > deadlocks > on different setups with completely different applications. > But I think that (deadlock) is one possible root of problems these setups > had in > common. > > So if one expert can tell me =E2=80=93 nope, disapearing drives can't be = related > to > (union)fs deadlocks, or the opposite, I'd be deeply grateful. > > -harry