From nobody Wed Oct 13 00:51:29 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 8C65717F69BE for ; Wed, 13 Oct 2021 00:51:42 +0000 (UTC) (envelope-from timp87@gmail.com) Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HTYrZ2n7Fz3lb3 for ; Wed, 13 Oct 2021 00:51:42 +0000 (UTC) (envelope-from timp87@gmail.com) Received: by mail-ed1-x52d.google.com with SMTP id t16so2941991eds.9 for ; Tue, 12 Oct 2021 17:51:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Oj3Xpt1V0q83D+gfImnbEIYOHiwg66NlaNuh91QH6a0=; b=SN4AkpH7isv6zU578WhNXBnLac/S+vLQPpQjbeWhxuQs227IoczuB2LG8t375XvH4w 1IgR4MfCVFwPMKwCQpUnPOyYF3kaih1PLnb4ESIfAER0kB3nbdROjBQcmHG1979/vsaa YAGsqDePNv60ojqXhcFUAVSb1Dik3P0q253iGewpcapkCxYSAP7Xoflw4UC6E2O/dN9r XSd6n6ruXYPAU6xFTFXb8LwoMnSNrbVXnqMxjgDI7M/eR1aN/BC3SxN5gHWTR53lNG/+ UX6CV0MYGB7Sj2O711q8+HOZBikHO1+3IURTQ2MRDwGokwZxzgWJAPStmRpmgpABOikP crmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Oj3Xpt1V0q83D+gfImnbEIYOHiwg66NlaNuh91QH6a0=; b=F/MOSeXkdh9ptIlOvu77aYkmOPzFP9aLoLcPM9nFIVr252LFZHo/IvqwhEeixLE1Ao qJcvk/FqP2flvyk3cVDxRZIPXpWxuZUhBn7Ebf0tsSqXNYKFJvcFYQ6g4v1DJkCsiaMy Hf1UyLFhEgvY/CrQfkCfwxpeBGhQiStGGueSp50zbySyhpysP+59M2Suogp7nVQrPAYL Q7ti365LuY35n7G92Lw/1uhTLaD5eVWAo834wyQkv8qgb3AZlK5SOWtBeU7al+kZ3hzz CR+BLcrX8tSvYen7rjyv9fNHndOxwJXtcRT9pbdQiVWynC9FNHR5x4f8aigRin3Rn5Uy 9v4w== X-Gm-Message-State: AOAM530OCjjF6O2jsDTYdHUpH25tt9Or8ihjH8UKZDNl5lC5XESmPJ4g OBtOe4Fto8wR7nWSYlXlP7Qi94oJt1Hws3gB0Fk= X-Google-Smtp-Source: ABdhPJxlAnBPyLtWtumHkwjAM7ApNY/R04zRXfQrYJdJEZhwGz/tV7LC3xx+nAs2nz6ghmlMvy7OYQgyCBAj3p1LQyo= X-Received: by 2002:a17:907:1c0e:: with SMTP id nc14mr36864868ejc.103.1634086301078; Tue, 12 Oct 2021 17:51:41 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Pavel Timofeev Date: Tue, 12 Oct 2021 18:51:29 -0600 Message-ID: Subject: Re: Dell Latitude 7400 - nvme0: Missing interrupt To: Warner Losh Cc: Chuck Tuffli , freebsd-current Content-Type: multipart/alternative; boundary="00000000000082166e05ce315d6d" X-Rspamd-Queue-Id: 4HTYrZ2n7Fz3lb3 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: Y --00000000000082166e05ce315d6d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =D0=B2=D1=82, 12 =D0=BE=D0=BA=D1=82. 2021 =D0=B3. =D0=B2 13:56, Warner Losh= : > One piece of data that would be good to have: > > nvmecontrol identify nvme0 > > There's an optional feature that none of my drives have, but that the > Linux driver does oddly > weird things when enabled. The output of that command will help me > understand if that may > be in play. Maybe we need to do oddly weird things too :) > > Warner > > On Sun, Oct 10, 2021 at 11:00 PM Warner Losh wrote: > >> >> >> On Sun, Oct 10, 2021 at 10:48 PM Pavel Timofeev wrote= : >> >>> =D1=81=D0=B1, 9 =D0=BE=D0=BA=D1=82. 2021 =D0=B3. =D0=B2 14:59, Warner L= osh : >>> >>>> >>>> >>>> On Sat, Oct 9, 2021, 8:44 AM Pavel Timofeev wrote: >>>> >>>>> >>>>> >>>>> =D0=BF=D1=82, 8 =D0=BE=D0=BA=D1=82. 2021 =D0=B3. =D0=B2 14:49, Warner= Losh : >>>>> >>>>>> >>>>>> >>>>>> On Fri, Oct 8, 2021 at 2:42 PM Pavel Timofeev >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> =D1=81=D0=B1, 21 =D0=B0=D0=B2=D0=B3. 2021 =D0=B3. =D0=B2 15:22, War= ner Losh : >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Aug 21, 2021 at 3:06 PM Pavel Timofeev >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Warner Losh : >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Aug 20, 2021 at 10:42 PM Pavel Timofeev >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Pavel Timofeev : >>>>>>>>>>> >>>>>>>>>>> > >>>>>>>>>>> > Chuck Tuffli : >>>>>>>>>>> > >>>>>>>>>>> >> On Mon, Aug 16, 2021 at 7:43 PM Pavel Timofeev < >>>>>>>>>>> timp87@gmail.com> wrote: >>>>>>>>>>> >> > >>>>>>>>>>> >> > Hello >>>>>>>>>>> >> > I've got a Dell Latitude 7400 and tried installing the >>>>>>>>>>> latest >>>>>>>>>>> >> 14.0-CURRENT >>>>>>>>>>> >> > (main-n248636-d20e9e02db3) on it. >>>>>>>>>>> >> > Despite other things the weird one which concerns me is >>>>>>>>>>> >> > nvme0: Missing interrupt >>>>>>>>>>> >> > message I get sometimes on the console. >>>>>>>>>>> >> > It seems like I get it only after the reboot of the laptop= , >>>>>>>>>>> i. e. not >>>>>>>>>>> >> > getting that message if I power cycle the laptop, at least >>>>>>>>>>> I haven't >>>>>>>>>>> >> seen >>>>>>>>>>> >> > them for now in such cases. >>>>>>>>>>> >> > So when the laptop is rebooted I can't even take advantage >>>>>>>>>>> of >>>>>>>>>>> >> > nvmecontrol(8) quickly. >>>>>>>>>>> >> > Well, it still works, but it takes tens of seconds to >>>>>>>>>>> return the output. >>>>>>>>>>> >> ... >>>>>>>>>>> >> > dmesg when power cycled - >>>>>>>>>>> >> > >>>>>>>>>>> https://drive.google.com/file/d/1dB27oB1O2CcnZy6DvOOhmFO8SN8V8S= wJ >>>>>>>>>>> >> > dmesg when rebooted - >>>>>>>>>>> >> > >>>>>>>>>>> https://drive.google.com/file/d/1DsKTMkihp_OmUcirByLaVO4o2mU38B= xh >>>>>>>>>>> >> >>>>>>>>>>> >> I'm sort of curious about the time stamps for the log >>>>>>>>>>> messages in the >>>>>>>>>>> >> failing case. Something like: >>>>>>>>>>> >> >>>>>>>>>>> >> $ grep "nv\(me\|d\)" /var/log/messages >>>>>>>>>>> >> >>>>>>>>>>> >> --chuck >>>>>>>>>>> >> >>>>>>>>>>> > >>>>>>>>>>> > Well, I can't see timestamps in the verbose boot log. Am I >>>>>>>>>>> missing some >>>>>>>>>>> > configuration for that? >>>>>>>>>>> > >>>>>>>>>>> > $ grep "nv\(me\|d\)" /var/log/messages >>>>>>>>>>> > nvme0: mem >>>>>>>>>>> > >>>>>>>>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104f= ff at device >>>>>>>>>>> > 0.0 on pci6 >>>>>>>>>>> > nvme0: attempting to allocate 5 MSI-X vectors (17 supported) >>>>>>>>>>> > nvme0: using IRQs 133-137 for MSI-X >>>>>>>>>>> > nvme0: CapLo: 0x140103ff: MQES 1023, CQR, TO 20 >>>>>>>>>>> > nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, MPSMIN 0, >>>>>>>>>>> MPSMAX 0 >>>>>>>>>>> > nvme0: Version: 0x00010300: 1.3 >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvme0: Missing interrupt >>>>>>>>>>> > nvd0: NVMe namespace >>>>>>>>>>> > GEOM: new disk nvd0 >>>>>>>>>>> > nvd0: 488386MB (1000215216 512 byte sectors) >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Ah, sorry, provided wrong output. >>>>>>>>>>> Here is what you requested: >>>>>>>>>>> $ grep "nv\(me\|d\)" /var/log/messages >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: m= em >>>>>>>>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104f= ff >>>>>>>>>>> at device >>>>>>>>>>> 0.0 on pci6 >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: attempting to allocate = 5 >>>>>>>>>>> MSI-X >>>>>>>>>>> vectors (17 supported) >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: using IRQs 133-137 for >>>>>>>>>>> MSI-X >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapLo: 0x140103ff: MQES >>>>>>>>>>> 1023, CQR, >>>>>>>>>>> TO 20 >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapHi: 0x00000030: DSTR= D >>>>>>>>>>> 0, NSSRS, >>>>>>>>>>> CSS 1, MPSMIN 0, MPSMAX 0 >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Version: 0x00010300: 1.= 3 >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvd0: >>>>>>>>>> 512GB> NVMe >>>>>>>>>>> namespace >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: GEOM: new disk nvd0 >>>>>>>>>>> Aug 21 04:34:36 nostromo kernel: nvd0: 488386MB (1000215216 512 >>>>>>>>>>> byte >>>>>>>>>>> sectors) >>>>>>>>>>> Aug 21 04:34:42 nostromo kernel: nvme0: Missing interrupt >>>>>>>>>>> Aug 21 04:35:36 nostromo kernel: nvme0: Missing interrupt >>>>>>>>>>> Aug 21 04:35:50 nostromo kernel: nvme0: Missing interrupt >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What happens if you set hw.nvme.use_nvd=3D0 and >>>>>>>>>> hw.cam.nda.nvd_compat=3D1 >>>>>>>>>> in the boot loader and reboot? Same thing except nda where nvd >>>>>>>>>> was? Or does >>>>>>>>>> it work? >>>>>>>>>> >>>>>>>>>> Something weird is going on in the interrupt assignment, I think= , >>>>>>>>>> but I >>>>>>>>>> wanted to get any nvd vs nda issues out of the way first. >>>>>>>>>> >>>>>>>>>> Warner >>>>>>>>>> >>>>>>>>> >>>>>>>>> Do you mean kern.cam.nda.nvd_compat instead >>>>>>>>> of hw.cam.nda.nvd_compat? >>>>>>>>> kern.cam.nda.nvd_compat is 1 by default now. >>>>>>>>> >>>>>>>>> So I tried to set hw.nvme.use_nvd to 1 as suggested, but I still >>>>>>>>> see >>>>>>>>> nvme0: Missing interrupt >>>>>>>>> and now also >>>>>>>>> Root mount waiting for: CAM >>>>>>>>> messages besides those >>>>>>>>> >>>>>>>> >>>>>>>> OK. That all makes sense. I'd forgotten that nvd_compat=3D1 by >>>>>>>> default these >>>>>>>> days. >>>>>>>> >>>>>>>> I'll take a look on monday starting at the differences in interrup= t >>>>>>>> assignment that >>>>>>>> are apparent when you cold boot vs reboot. >>>>>>>> >>>>>>>> Thanks for checking... I'd hoped this was a cheap fix, but also >>>>>>>> didn't really >>>>>>>> expect it to be. >>>>>>>> >>>>>>>> Warner >>>>>>>> >>>>>>>> >>>>>>> I've recently upgraded to main-n249974-17f790f49f5 and it got even >>>>>>> worse now. >>>>>>> So clean poweron works as before. >>>>>>> But if rebooted nvme drive refuses to work, while before the code >>>>>>> upgrade it was just complaining about missing interrupts. >>>>>>> >>>>>>> currently dmesg show this: >>>>>>> nvme0: mem >>>>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff a= t device >>>>>>> 0.0 on pci6 >>>>>>> nvd0: NVMe namespace >>>>>>> nvd0: 488386MB (1000215216 512 byte sectors) >>>>>>> nvme0: mem >>>>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff a= t device >>>>>>> 0.0 on pci6 >>>>>>> >>>>>> >>>>>> Why is this showing up twice? Or is everything above this line left >>>>>> over from the first, working boot? >>>>>> >>>>>> >>>>>>> nvme0: RECOVERY_START 9585870784 vs 9367036288 >>>>>>> nvme0: timeout with nothing complete, resetting >>>>>>> nvme0: Resetting controller due to a timeout. >>>>>>> nvme0: RECOVERY_WAITING >>>>>>> nvme0: resetting controller >>>>>>> nvme0: aborting outstanding admin command >>>>>>> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 >>>>>>> cdw11:00000000 >>>>>>> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 >>>>>>> nvme0: nvme_identify_controller failed! >>>>>>> nvme0: waiting >>>>>>> >>>>>> >>>>>> Clearly something bad is going on with the drive here... We looked >>>>>> into the completion queues when we didn't get an interrupt and there= was >>>>>> nothing complete there.... >>>>>> >>>>>> The only thing I can think of is that this means there's a phase >>>>>> error between the drive and the system. I recently removed a second = reset >>>>>> and made it an option NVME_2X_RESET. Can you see if adding >>>>>> 'options NVME_2X_RESET' to your kernel config fixes this? >>>>>> >>>>>> Warner >>>>>> >>>>>> >>>>>>> nvme0: mem >>>>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff a= t device >>>>>>> 0.0 on pci6 >>>>>>> nvme0: RECOVERY_START 9362778467 vs 9361830561 >>>>>>> nvme0: timeout with nothing complete, resetting >>>>>>> nvme0: Resetting controller due to a timeout. >>>>>>> nvme0: RECOVERY_WAITING >>>>>>> nvme0: resetting controller >>>>>>> nvme0: aborting outstanding admin command >>>>>>> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 >>>>>>> cdw11:00000000 >>>>>>> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 >>>>>>> nvme0: nvme_identify_controller failed! >>>>>>> nvme0: waiting >>>>>>> >>>>>>> >>>>> >>>>> Sorry, it's showing twice due to multiple reboots. For one boot it's >>>>> like: >>>>> nvme0: mem >>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at = device >>>>> 0.0 on pci6 >>>>> nvme0: RECOVERY_START 9633303481 vs 9365971423 >>>>> nvme0: timeout with nothing complete, resetting >>>>> nvme0: Resetting controller due to a timeout. >>>>> nvme0: RECOVERY_WAITING >>>>> nvme0: resetting controller >>>>> nvme0: aborting outstanding admin command >>>>> nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:000000= 00 >>>>> nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 >>>>> nvme0: nvme_identify_controller failed! >>>>> nvme0: waiting >>>>> >>>>> Well, neither Windows not Linux have any problems with the device. I >>>>> understand they may be hiding it or workaround somehow. >>>>> >>>> >>>> Yea, I'm trying to figure out why your machine is different than mine, >>>> and what Windows or Linux do that is different. It may be dodgy hardwa= re, >>>> but others have no trouble... >>>> >>>> I'll try setting NVME_2X_RESET in the kernel config and report back in >>>>> a while. >>>>> >>>> >>>> Thanks. If it helps, that tells me something. If it doesn't, that tell= s >>>> me something else. >>>> >>>> I suspect that it is somewhere else in the system, tbh, but I need to >>>> find it systematically. >>>> >>>> Warner >>>> >>> >>> Surprisingly, setting NVME_2X_RESET in the kernel config hasn't changed >>> anything. I. e it didn't help. >>> >> >> While it would have been nice to have this be the fix, I'm not that >> surprised either. >> It was the biggest change of late, apart from the big re-arrangement tha= t >> I'd done. >> >> So the other changes have moved from the occasional missing interrupt >> message >> (which the old code would get when a command wasn't completed in the >> timeout >> period, but that we found to be done when we scanned the completion >> queue. Now >> the device is detected fine (as before), but then doesn't do I/O at all >> (including not >> completing the identify command!) and is worse. This is unexpected and >> I'm trying >> understand what happens on reboot that 'changes'the working state and wh= y >> the >> new code behaves so differently. >> >> Warner >> > Sure, here it is: Controller Capabilities/Features =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D Vendor ID: 1c5c Subsystem Vendor ID: 1c5c Serial Number: ND03N46381010423H Model Number: PC611 NVMe SK hynix 512GB Firmware Version: 11000111 Recommended Arb Burst: 4 IEEE OUI Identifier: 2e e4 ac Multi-Path I/O Capabilities: Not Supported Max Data Transfer Size: 262144 bytes Sanitize Crypto Erase: Not Supported Sanitize Block Erase: Supported Sanitize Overwrite: Not Supported Sanitize NDI: Not Supported Sanitize NODMMAS: Undefined Controller ID: 0x0001 Version: 1.3.0 Admin Command Set Attributes =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D Security Send/Receive: Supported Format NVM: Supported Firmware Activate/Download: Supported Namespace Management: Not Supported Device Self-test: Supported Directives: Not Supported NVMe-MI Send/Receive: Not Supported Virtualization Management: Not Supported Doorbell Buffer Config: Not Supported Get LBA Status: Not Supported Sanitize: block, Abort Command Limit: 4 Async Event Request Limit: 8 Number of Firmware Slots: 3 Firmware Slot 1 Read-Only: No Per-Namespace SMART Log: No Error Log Page Entries: 256 Number of Power States: 5 Total NVM Capacity: 0 bytes Unallocated NVM Capacity: 0 bytes Firmware Update Granularity: 00 (Not Reported) Host Buffer Preferred Size: 0 bytes Host Buffer Minimum Size: 0 bytes NVM Command Set Attributes =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D Submission Queue Entry Size Max: 64 Min: 64 Completion Queue Entry Size Max: 16 Min: 16 Number of Namespaces: 1 Compare Command: Supported Write Uncorrectable Command: Supported Dataset Management Command: Supported Write Zeroes Command: Supported Save Features: Supported Reservations: Not Supported Timestamp feature: Supported Verify feature: Not Supported Fused Operation Support: Not Supported Format NVM Attributes: Per-NS Erase, Per-NS Format Volatile Write Cache: Present NVM Subsystem Name: --00000000000082166e05ce315d6d--