Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Dec 2023 08:09:29 +0900
From:      Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
To:        freebsd-current@freebsd.org
Subject:   Re: nvme timeout issues with hardware and bhyve vm's
Message-ID:  <20231208080929.cfd9fca421fea81d89d2380b@dec.sakura.ne.jp>
In-Reply-To: <ec08484d-b49f-4aa3-adf4-b96570083b9c@nomadlogic.org>
References:  <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org> <CANCZdfrQTd3F-j81HsamUCJG4DyUk_-yPOtbZY4Q926_ihatsQ@mail.gmail.com> <0ad493d5-1c1e-4370-977a-118f46ebd677@nomadlogic.org> <CANCZdfrwzmZ=iHj_vm2nsi72ceRQ81KY5DjiuML3udEaWTBanA@mail.gmail.com> <0c4f8149-89dd-4635-a5ed-4766fffd2553@nomadlogic.org> <CANCZdfpgw_sm4couYx9%2Bcgp-q_2jmPC2Q7TSeD9Yb3VYoiDQhQ@mail.gmail.com> <ec08484d-b49f-4aa3-adf4-b96570083b9c@nomadlogic.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 7 Dec 2023 14:38:37 -0800
Pete Wright <pete@nomadlogic.org> wrote:

> 
> 
> On 10/13/23 7:34 PM, Warner Losh wrote:
> > 
> 
> > 
> >     the messages i posted in the start of the thread are from the VM itself
> >     (13.2-RELEASE).  The zpool on the hypervisor (13.2-RELEASE) showed no
> >     such issues.
> > 
> >     Based on your comment about the improvements in 14 I'll focus my
> >     efforts
> >     on my workstation, it seemed to happen regularly so hopefully i can
> >     find
> >     a repo case.
> > 
> > 
> > Let me now if you see similar messages in stable/14. I think I've fixed 
> > all the
> > issues with timeouts, though you shouldn't ever seem them in a vm setup
> > unless something else weird is going on.
> > 
> 
> 
> Hi Warner, just resurfacing this thread because I've had a few lockups 
> on my workstation running 14.0-STABLE.  I was able to capture a photo of 
> the hang and this seems to be the most important line:
> 
> nvme0: Resetting controller due to a timeout and possible hot unplug.
> 
> When I scan the device after reboot I don't see any errors, but if there 
> is a particular thing I should check via nvmecontrol please let me know. 
>   Also, since it mentions possible hot unplug I wonder if this is 
> hardware/firmware related to my system?
> 
> Anyway, haven't found a repro case yet but it has locked up a few times 
> the past two weeks.
> 
> -pete
> 
> 
> -- 
> Pete Wright
> pete@nomadlogic.org

If I myself encounter this kind of problem ON BARE METAL HARDWARE,
I would usually suspect

 *Overheating caused hang of NVMe controller or PCI bridge on SSD, or

 *Unstable physical connection (bad contact)

first.


-- 
Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20231208080929.cfd9fca421fea81d89d2380b>