From nobody Fri Oct 8 20:49:02 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9D71317E7323 for ; Fri, 8 Oct 2021 20:49:14 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ua1-x92a.google.com (mail-ua1-x92a.google.com [IPv6:2607:f8b0:4864:20::92a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HR0ff3hSMz3sgx for ; Fri, 8 Oct 2021 20:49:14 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-ua1-x92a.google.com with SMTP id i8so7593551uae.7 for ; Fri, 08 Oct 2021 13:49:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=G2T6xixbR51gXles+5J8VZj8KNxOGhVd1Fzhsy9f65o=; b=LyPxlAL5+7th1ijv77nkYBMBZVjV8MlKWxasGhmGInfj6QCzPIhZK+Q/tJuofj3NQE QeqolXco5gBC3OBrBuiXWkFLX70oTDYcwHwr+5wkG1pO1xmk7KwmL8fbrQWhxAA59Pkp 3Q9mqWLt8WNmYhn15kQZcUQM75unpLWwcFT5Bu7TIci4VluiPjZuIqlR0nTdKgB7UwmB Y06xh+ii3Pkr2sp7/Q0A2m6rA4LMqAeTe/lwomS+liEQLAHNessn+DhS0b/XuUGHf8pW b6Gox8IMM8VekUNYHfEsZAMlp/x/LmHNx1VJggfbv1AM3Yz1GLRR536n/fty7/rAbmuY w9fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=G2T6xixbR51gXles+5J8VZj8KNxOGhVd1Fzhsy9f65o=; b=7J7JDSegyzx5BPT0qW0GKFih/Xas2R/jDNnymU2FQAbNxVuA/K5yyIWnIc2Y4+DOe1 PQLR61EoLr9B3XWYmdIu03pVTlqWP3S9gwlNq+Ims8Bbyd6mmPt0p17VPtIJoO3pSMkM xthYqdLZVg7of1ADwdPJxPOmaHZ2IkZaaoNWKGgykTfZLGdaORiXvy5nXuYF4S3RoBrY ldM+RUYRb1wPfGUrMM9BdFji3HBvko7qsv9fMaRQhNIvmJQgsSDMUVqnbuD25ZwpdwUF F8OrbCoOhYxTvpfbWMFfArwOpEwHlhwK1+flGjwhQfqzE1pJl7HbPrP3+6vs5Ls7DcE9 vUbg== X-Gm-Message-State: AOAM5336IHNsVSVbqLH0jR7QKP9LTrge+++KVUVusPGnb3zVkEYuk8aI Lj4VP7HC3kN01C1KR8TWO3aMR0v4SnX/GslCyXroiA== X-Google-Smtp-Source: ABdhPJyN1CRH4zeQufyL2aiRVxjIOw7oQZlMDNI1y0PotNLGf10/vB/EOcpFnK+1Bhp6Jhw4s6sfPXvZMX1NoOQ6alI= X-Received: by 2002:a9f:23d0:: with SMTP id 74mr5762756uao.69.1633726153629; Fri, 08 Oct 2021 13:49:13 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Fri, 8 Oct 2021 14:49:02 -0600 Message-ID: Subject: Re: Dell Latitude 7400 - nvme0: Missing interrupt To: Pavel Timofeev Cc: Chuck Tuffli , freebsd-current Content-Type: multipart/alternative; boundary="0000000000000c35c905cddd837c" X-Rspamd-Queue-Id: 4HR0ff3hSMz3sgx X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: Y --0000000000000c35c905cddd837c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Oct 8, 2021 at 2:42 PM Pavel Timofeev wrote: > > > =D1=81=D0=B1, 21 =D0=B0=D0=B2=D0=B3. 2021 =D0=B3. =D0=B2 15:22, Warner Lo= sh : > >> >> >> On Sat, Aug 21, 2021 at 3:06 PM Pavel Timofeev wrote: >> >>> >>> >>> Warner Losh : >>> >>>> >>>> >>>> On Fri, Aug 20, 2021 at 10:42 PM Pavel Timofeev >>>> wrote: >>>> >>>>> Pavel Timofeev : >>>>> >>>>> > >>>>> > Chuck Tuffli : >>>>> > >>>>> >> On Mon, Aug 16, 2021 at 7:43 PM Pavel Timofeev >>>>> wrote: >>>>> >> > >>>>> >> > Hello >>>>> >> > I've got a Dell Latitude 7400 and tried installing the latest >>>>> >> 14.0-CURRENT >>>>> >> > (main-n248636-d20e9e02db3) on it. >>>>> >> > Despite other things the weird one which concerns me is >>>>> >> > nvme0: Missing interrupt >>>>> >> > message I get sometimes on the console. >>>>> >> > It seems like I get it only after the reboot of the laptop, i. e= . >>>>> not >>>>> >> > getting that message if I power cycle the laptop, at least I >>>>> haven't >>>>> >> seen >>>>> >> > them for now in such cases. >>>>> >> > So when the laptop is rebooted I can't even take advantage of >>>>> >> > nvmecontrol(8) quickly. >>>>> >> > Well, it still works, but it takes tens of seconds to return the >>>>> output. >>>>> >> ... >>>>> >> > dmesg when power cycled - >>>>> >> > https://drive.google.com/file/d/1dB27oB1O2CcnZy6DvOOhmFO8SN8V8Sw= J >>>>> >> > dmesg when rebooted - >>>>> >> > https://drive.google.com/file/d/1DsKTMkihp_OmUcirByLaVO4o2mU38Bx= h >>>>> >> >>>>> >> I'm sort of curious about the time stamps for the log messages in >>>>> the >>>>> >> failing case. Something like: >>>>> >> >>>>> >> $ grep "nv\(me\|d\)" /var/log/messages >>>>> >> >>>>> >> --chuck >>>>> >> >>>>> > >>>>> > Well, I can't see timestamps in the verbose boot log. Am I missing >>>>> some >>>>> > configuration for that? >>>>> > >>>>> > $ grep "nv\(me\|d\)" /var/log/messages >>>>> > nvme0: mem >>>>> > 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff a= t >>>>> device >>>>> > 0.0 on pci6 >>>>> > nvme0: attempting to allocate 5 MSI-X vectors (17 supported) >>>>> > nvme0: using IRQs 133-137 for MSI-X >>>>> > nvme0: CapLo: 0x140103ff: MQES 1023, CQR, TO 20 >>>>> > nvme0: CapHi: 0x00000030: DSTRD 0, NSSRS, CSS 1, MPSMIN 0, MPSMAX 0 >>>>> > nvme0: Version: 0x00010300: 1.3 >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvme0: Missing interrupt >>>>> > nvd0: NVMe namespace >>>>> > GEOM: new disk nvd0 >>>>> > nvd0: 488386MB (1000215216 512 byte sectors) >>>>> > >>>>> >>>>> >>>>> Ah, sorry, provided wrong output. >>>>> Here is what you requested: >>>>> $ grep "nv\(me\|d\)" /var/log/messages >>>>> Aug 21 04:34:36 nostromo kernel: nvme0: mem >>>>> 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at >>>>> device >>>>> 0.0 on pci6 >>>>> Aug 21 04:34:36 nostromo kernel: nvme0: attempting to allocate 5 MSI-= X >>>>> vectors (17 supported) >>>>> Aug 21 04:34:36 nostromo kernel: nvme0: using IRQs 133-137 for MSI-X >>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapLo: 0x140103ff: MQES 1023, >>>>> CQR, >>>>> TO 20 >>>>> Aug 21 04:34:36 nostromo kernel: nvme0: CapHi: 0x00000030: DSTRD 0, >>>>> NSSRS, >>>>> CSS 1, MPSMIN 0, MPSMAX 0 >>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Version: 0x00010300: 1.3 >>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>> Aug 21 04:34:36 nostromo kernel: nvme0: Missing interrupt >>>>> Aug 21 04:34:36 nostromo kernel: nvd0: NV= Me >>>>> namespace >>>>> Aug 21 04:34:36 nostromo kernel: GEOM: new disk nvd0 >>>>> Aug 21 04:34:36 nostromo kernel: nvd0: 488386MB (1000215216 512 byte >>>>> sectors) >>>>> Aug 21 04:34:42 nostromo kernel: nvme0: Missing interrupt >>>>> Aug 21 04:35:36 nostromo kernel: nvme0: Missing interrupt >>>>> Aug 21 04:35:50 nostromo kernel: nvme0: Missing interrupt >>>>> >>>> >>>> What happens if you set hw.nvme.use_nvd=3D0 and hw.cam.nda.nvd_compat= =3D1 >>>> in the boot loader and reboot? Same thing except nda where nvd was? Or >>>> does >>>> it work? >>>> >>>> Something weird is going on in the interrupt assignment, I think, but = I >>>> wanted to get any nvd vs nda issues out of the way first. >>>> >>>> Warner >>>> >>> >>> Do you mean kern.cam.nda.nvd_compat instead of hw.cam.nda.nvd_compat? >>> kern.cam.nda.nvd_compat is 1 by default now. >>> >>> So I tried to set hw.nvme.use_nvd to 1 as suggested, but I still see >>> nvme0: Missing interrupt >>> and now also >>> Root mount waiting for: CAM >>> messages besides those >>> >> >> OK. That all makes sense. I'd forgotten that nvd_compat=3D1 by default t= hese >> days. >> >> I'll take a look on monday starting at the differences in interrupt >> assignment that >> are apparent when you cold boot vs reboot. >> >> Thanks for checking... I'd hoped this was a cheap fix, but also didn't >> really >> expect it to be. >> >> Warner >> >> > I've recently upgraded to main-n249974-17f790f49f5 and it got even worse > now. > So clean poweron works as before. > But if rebooted nvme drive refuses to work, while before the code upgrade > it was just complaining about missing interrupts. > > currently dmesg show this: > nvme0: mem > 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at devi= ce > 0.0 on pci6 > nvd0: NVMe namespace > nvd0: 488386MB (1000215216 512 byte sectors) > nvme0: mem > 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at devi= ce > 0.0 on pci6 > Why is this showing up twice? Or is everything above this line left over from the first, working boot? > nvme0: RECOVERY_START 9585870784 vs 9367036288 > nvme0: timeout with nothing complete, resetting > nvme0: Resetting controller due to a timeout. > nvme0: RECOVERY_WAITING > nvme0: resetting controller > nvme0: aborting outstanding admin command > nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:00000000 > nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 > nvme0: nvme_identify_controller failed! > nvme0: waiting > Clearly something bad is going on with the drive here... We looked into the completion queues when we didn't get an interrupt and there was nothing complete there.... The only thing I can think of is that this means there's a phase error between the drive and the system. I recently removed a second reset and made it an option NVME_2X_RESET. Can you see if adding 'options NVME_2X_RESET' to your kernel config fixes this? Warner > nvme0: mem > 0xcc100000-0xcc103fff,0xcc105000-0xcc105fff,0xcc104000-0xcc104fff at devi= ce > 0.0 on pci6 > nvme0: RECOVERY_START 9362778467 vs 9361830561 > nvme0: timeout with nothing complete, resetting > nvme0: Resetting controller due to a timeout. > nvme0: RECOVERY_WAITING > nvme0: resetting controller > nvme0: aborting outstanding admin command > nvme0: IDENTIFY (06) sqid:0 cid:15 nsid:0 cdw10:00000001 cdw11:00000000 > nvme0: ABORTED - BY REQUEST (00/07) sqid:0 cid:15 cdw0:0 > nvme0: nvme_identify_controller failed! > nvme0: waiting > > --0000000000000c35c905cddd837c--