From nobody Mon Oct 11 04:48:17 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 7B79317E9293 for ; Mon, 11 Oct 2021 04:48:32 +0000 (UTC) (envelope-from timp87@gmail.com) Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HSRBl0t1rz4ZjR for ; Mon, 11 Oct 2021 04:48:31 +0000 (UTC) (envelope-from timp87@gmail.com) Received: by mail-ed1-x532.google.com with SMTP id w19so2715095edd.2 for ; Sun, 10 Oct 2021 21:48:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oOfpZ+Cc8pB2ve7MSPD8xsKmWNzn+0I5V4/uF51cDQ4=; b=g6aRlPt0IuKOM388u8DlfiVprRzS+OmRdXuxSptbJl3BU3a44f3dK35letbiR2dakq XpRd5t0VouWe4GHPyyxVUjkh82E4OGywxux+oNemeKx6SNakC5onn4ySl8ZKxGZIUl3X UmD8MHaO8P1fWHQz3QTTaJ97mvLjLWOeh/OMf2cwilAH7qWmAms5BvZcq4K8o6xzbEnQ 7t6uZ42yJueCOi7l1r5D7RH5MT2RKMqostC7mhxk9gq0z1mqor1Q3O4RijcQ2uByYcLq udVNtVWFBgFvg7MErJI/qgBpQpGXIQVpQ4PhIyLbWU5zZLz5lsnlTfqOvTSYHHeaOXxO C6Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oOfpZ+Cc8pB2ve7MSPD8xsKmWNzn+0I5V4/uF51cDQ4=; b=HItUU5TpPj0KNilFbEACuo0YnFc+5nPyeHYMsx1Nx3d40IgEDfR4MkVGtpxK0nDtjV AXGveXPm8xPCX8agoZxcCF78W6RISqW/rpirnKgeYBtAA7VBEVwskJ3Q60DSC+M9pLlW plEJKYIXUPtUR/oudhPeKvLxHFBEKGCpiH8hsau1UEqQqduJ5Xb5iHxx+r2EGqlH7bju pwV0/xJSinHaZFogrsazxUssiX9bHZBW4KG6p1yI4M+CtYOa3QBz3XVWZbg3ut+kOBpr v23PYUV6JpT5mJJZrvQSXDTKs4AkrPA0YsCaGmwx52Ou/pI8QMmWEP1WGWRiIf5qePJP RQ4w== X-Gm-Message-State: AOAM5304NoLWxlGi/fm2locoDH0IhMLlcCoCjliQeX39aXdM4QsX/UyB VMvLwqWSgw731i4BKbXCqnFOTytjYR3rx5IVzBgTnSxT5G4= X-Google-Smtp-Source: ABdhPJwYtOK0YJKmKiGiEOE9GphREeSX12b3ssgP5dNbNZVQtMGcrLD6P6Mc+jVdbaA2V3lJlbd0LEhs66a58NKt4b4= X-Received: by 2002:a50:9d83:: with SMTP id w3mr38546620ede.305.1633927708431; Sun, 10 Oct 2021 21:48:28 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Pavel Timofeev Date: Sun, 10 Oct 2021 22:48:17 -0600 Message-ID: Subject: Re: Dell Latitude 7400 - nvme0: Missing interrupt To: Warner Losh Cc: Chuck Tuffli , freebsd-current Content-Type: multipart/alternative; boundary="000000000000a65a9e05ce0c7004" X-Rspamd-Queue-Id: 4HSRBl0t1rz4ZjR X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b=g6aRlPt0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of timp87@gmail.com designates 2a00:1450:4864:20::532 as permitted sender) smtp.mailfrom=timp87@gmail.com X-Spamd-Result: default: False [-2.74 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.998]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; MIME_GOOD(-0.10)[multipart/alternative]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; NEURAL_HAM_LONG(-0.99)[-0.994]; NEURAL_SPAM_SHORT(0.26)[0.256]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::532:from]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:~,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-ThisMailContainsUnwantedMimeParts: Y --000000000000a65a9e05ce0c7004 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =D1=81=D0=B1, 9 =D0=BE=D0=BA=D1=82. 2021 =D0=B3. =D0=B2 14:59, Warner Losh = : > > > On Sat, Oct 9, 2021, 8:44 AM Pavel Timofeev wrote: > >> >> >> =D0=BF=D1=82, 8 =D0=BE=D0=BA=D1=82. 2021 =D0=B3. =D0=B2 14:49, Warner Lo= sh : >> >>> >>> >>> On Fri, Oct 8, 2021 at 2:42 PM Pavel Timofeev wrote: >>> >>>> >>>> >>>> =D1=81=D0=B1, 21 =D0=B0=D0=B2=D0=B3. 2021 =D0=B3. =D0=B2 15:22, Warner= Losh : >>>> >>>>> >>>>> >>>>> On Sat, Aug 21, 2021 at 3:06 PM Pavel Timofeev >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> Warner Losh : >>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 20, 2021 at 10:42 PM Pavel Timofeev >>>>>>> wrote: >>>>>>> >>>>>>>> Pavel Timofeev : >>>>>>>> >>>>>>>> > >>>>>>>> > Chuck Tuffli : >>>>>>>> > >>>>>>>> >> On Mon, Aug 16, 2021 at 7:43 PM Pavel Timofeev >>>>>>>> wrote: >>>>>>>> >> > >>>>>>>> >> > Hello >>>>>>>> >> > I've got a Dell Latitude 7400 and tried installing the latest >>>>>>>> >> 14.0-CURRENT >>>>>>>> >> > (main-n248636-d20e9e02db3) on it. >>>>>>>> >> > Despite other things the weird one which concerns me is >>>>>>>> >> > nvme0: Missing interrupt >>>>>>>> >> > message I get sometimes on the console. >>>>>>>> >> > It seems like I get it only after the reboot of the laptop, i= . >>>>>>>> e. not >>>>>>>> >> > getting that message if I power cycle the laptop, at least I >>>>>>>> haven't >>>>>>>> >> seen >>>>>>>> >> > them for now in such cases. >>>>>>>> >> > So when the laptop is rebooted I can't even take advantage of >>>>>>>> >> > nvmecontrol(8) quickly. >>>>>>>> >> > Well, it still works, but it takes tens of seconds to return >>>>>>>> the output. >>>>>>>> >> ... >>>>>>>> >> > dmesg when power cycled - >>>>>>>> >> > >>>>>>>> https://drive.google.com/file/d/1dB27oB1O2CcnZy6DvOOhmFO8SN8V8SwJ >>>>>>>> >> > dmesg when rebooted - >>>>>>>> >> > >>>>>>>> https://drive.google.com/file/d/1DsKTMkihp_OmUcirByLaVO4o2mU38Bxh >>>>>>>> >> >>>>>>>> >> I'm sort of curious about the time stamps for the log messages >>>>>>>> in the >>>>>>>> >> failing case. Something like: >>>>>>>> >> >>>>>>>> >> $ grep "nv\(me\|d\)" /var/log/messages >>>>>>>> >> >>>>>>>> >> --chuck >>>>>>>> >> >>>>>>>> > >>>>>>>> > Well, I can't see timestamps in the verbose boot log. Am I >>>>>>>> missing some >>>>>>>> > configuration for that? >>>>>>>> > >>>>>>>> > $ grep "nv\(me\|d\)" /var/log/messages >>>>>>>> > nvme0: mem >>>>>>>> > 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104ff= f >>>>>>>> at device >>>>>>>> > 0.0 on pci6 >>>>>>>> > nvme0: attempting to allocate 5 MSI-X vectors (17 supported) >>>>>>>> > nvme0: using IRQs 133-137 for MSI-X >>>>>>>> > nvme0: CapLo: 0x140103ff: MQES 1023, CQR, TO 20 >>>>>>>> > nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, MPSMIN 0, MPSMA= X >>>>>>>> 0 >>>>>>>> > nvme0: Version: 0x00010300: 1.3 >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvme0: Missing interrupt >>>>>>>> > nvd0: NVMe namespace >>>>>>>> > GEOM: new disk nvd0 >>>>>>>> > nvd0: 488386MB (1000215216 512 byte sectors) >>>>>>>> > >>>>>>>> >>>>>>>> >>>>>>>> Ah, sorry, provided wrong output. >>>>>>>> Here is what you requested: >>>>>>>> $ grep "nv\(me\|d\)" /var/log/messages >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: mem >>>>>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff >>>>>>>> at device >>>>>>>> 0.0 on pci6 >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: attempting to allocate 5 >>>>>>>> MSI-X >>>>>>>> vectors (17 supported) >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: using IRQs 133-137 for MSI= -X >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapLo: 0x140103ff: MQES >>>>>>>> 1023, CQR, >>>>>>>> TO 20 >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapHi: 0x00000030: DSTRD 0= , >>>>>>>> NSSRS, >>>>>>>> CSS 1, MPSMIN 0, MPSMAX 0 >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Version: 0x00010300: 1.3 >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvd0: >>>>>>>> NVMe >>>>>>>> namespace >>>>>>>> Aug 21 04:34:36 nostromo kernel: GEOM: new disk nvd0 >>>>>>>> Aug 21 04:34:36 nostromo kernel: nvd0: 488386MB (1000215216 512 by= te >>>>>>>> sectors) >>>>>>>> Aug 21 04:34:42 nostromo kernel: nvme0: Missing interrupt >>>>>>>> Aug 21 04:35:36 nostromo kernel: nvme0: Missing interrupt >>>>>>>> Aug 21 04:35:50 nostromo kernel: nvme0: Missing interrupt >>>>>>>> >>>>>>> >>>>>>> What happens if you set hw.nvme.use_nvd=3D0 and hw.cam.nda.nvd_comp= at=3D1 >>>>>>> in the boot loader and reboot? Same thing except nda where nvd was? >>>>>>> Or does >>>>>>> it work? >>>>>>> >>>>>>> Something weird is going on in the interrupt assignment, I think, >>>>>>> but I >>>>>>> wanted to get any nvd vs nda issues out of the way first. >>>>>>> >>>>>>> Warner >>>>>>> >>>>>> >>>>>> Do you mean kern.cam.nda.nvd_compat instead of hw.cam.nda.nvd_compat= ? >>>>>> kern.cam.nda.nvd_compat is 1 by default now. >>>>>> >>>>>> So I tried to set hw.nvme.use_nvd to 1 as suggested, but I still se= e >>>>>> nvme0: Missing interrupt >>>>>> and now also >>>>>> Root mount waiting for: CAM >>>>>> messages besides those >>>>>> >>>>> >>>>> OK. That all makes sense. I'd forgotten that nvd_compat=3D1 by defaul= t >>>>> these >>>>> days. >>>>> >>>>> I'll take a look on monday starting at the differences in interrupt >>>>> assignment that >>>>> are apparent when you cold boot vs reboot. >>>>> >>>>> Thanks for checking... I'd hoped this was a cheap fix, but also didn'= t >>>>> really >>>>> expect it to be. >>>>> >>>>> Warner >>>>> >>>>> >>>> I've recently upgraded to main-n249974-17f790f49f5 and it got even >>>> worse now. >>>> So clean poweron works as before. >>>> But if rebooted nvme drive refuses to work, while before the code >>>> upgrade it was just complaining about missing interrupts. >>>> >>>> currently dmesg show this: >>>> nvme0: mem >>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at d= evice >>>> 0.0 on pci6 >>>> nvd0: NVMe namespace >>>> nvd0: 488386MB (1000215216 512 byte sectors) >>>> nvme0: mem >>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at d= evice >>>> 0.0 on pci6 >>>> >>> >>> Why is this showing up twice? Or is everything above this line left ove= r >>> from the first, working boot? >>> >>> >>>> nvme0: RECOVERY_START 9585870784 vs 9367036288 >>>> nvme0: timeout with nothing complete, resetting >>>> nvme0: Resetting controller due to a timeout. >>>> nvme0: RECOVERY_WAITING >>>> nvme0: resetting controller >>>> nvme0: aborting outstanding admin command >>>> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:0000000= 0 >>>> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 >>>> nvme0: nvme_identify_controller failed! >>>> nvme0: waiting >>>> >>> >>> Clearly something bad is going on with the drive here... We looked into >>> the completion queues when we didn't get an interrupt and there was not= hing >>> complete there.... >>> >>> The only thing I can think of is that this means there's a phase error >>> between the drive and the system. I recently removed a second reset and >>> made it an option NVME_2X_RESET. Can you see if adding >>> 'options NVME_2X_RESET' to your kernel config fixes this? >>> >>> Warner >>> >>> >>>> nvme0: mem >>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at d= evice >>>> 0.0 on pci6 >>>> nvme0: RECOVERY_START 9362778467 vs 9361830561 >>>> nvme0: timeout with nothing complete, resetting >>>> nvme0: Resetting controller due to a timeout. >>>> nvme0: RECOVERY_WAITING >>>> nvme0: resetting controller >>>> nvme0: aborting outstanding admin command >>>> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:0000000= 0 >>>> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 >>>> nvme0: nvme_identify_controller failed! >>>> nvme0: waiting >>>> >>>> >> >> Sorry, it's showing twice due to multiple reboots. For one boot it's lik= e: >> nvme0: mem >> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at dev= ice >> 0.0 on pci6 >> nvme0: RECOVERY_START 9633303481 vs 9365971423 >> nvme0: timeout with nothing complete, resetting >> nvme0: Resetting controller due to a timeout. >> nvme0: RECOVERY_WAITING >> nvme0: resetting controller >> nvme0: aborting outstanding admin command >> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:00000000 >> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 >> nvme0: nvme_identify_controller failed! >> nvme0: waiting >> >> Well, neither Windows not Linux have any problems with the device. I >> understand they may be hiding it or workaround somehow. >> > > Yea, I'm trying to figure out why your machine is different than mine, an= d > what Windows or Linux do that is different. It may be dodgy hardware, but > others have no trouble... > > I'll try setting NVME_2X_RESET in the kernel config and report back in a >> while. >> > > Thanks. If it helps, that tells me something. If it doesn't, that tells m= e > something else. > > I suspect that it is somewhere else in the system, tbh, but I need to fin= d > it systematically. > > Warner > Surprisingly, setting NVME_2X_RESET in the kernel config hasn't changed anything. I. e it didn't help. --000000000000a65a9e05ce0c7004--