Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Dec 2023 15:44:07 -0800
From:      Pete Wright <pete@nomadlogic.org>
To:        Warner Losh <imp@bsdimp.com>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: nvme timeout issues with hardware and bhyve vm's
Message-ID:  <a3742599-7e42-498f-861f-7023e68a3877@nomadlogic.org>
In-Reply-To: <CANCZdfoWXNtrSCJytGmg6WoNAMjJSVpd%2BGD6gtLJAP=m-ROWgg@mail.gmail.com>
References:  <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org> <CANCZdfrQTd3F-j81HsamUCJG4DyUk_-yPOtbZY4Q926_ihatsQ@mail.gmail.com> <0ad493d5-1c1e-4370-977a-118f46ebd677@nomadlogic.org> <CANCZdfrwzmZ=iHj_vm2nsi72ceRQ81KY5DjiuML3udEaWTBanA@mail.gmail.com> <0c4f8149-89dd-4635-a5ed-4766fffd2553@nomadlogic.org> <CANCZdfpgw_sm4couYx9%2Bcgp-q_2jmPC2Q7TSeD9Yb3VYoiDQhQ@mail.gmail.com> <ec08484d-b49f-4aa3-adf4-b96570083b9c@nomadlogic.org> <CANCZdfoWXNtrSCJytGmg6WoNAMjJSVpd%2BGD6gtLJAP=m-ROWgg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 12/7/23 2:49 PM, Warner Losh wrote:
> 
> 
> On Thu, Dec 7, 2023 at 3:38 PM Pete Wright <pete@nomadlogic.org 
> <mailto:pete@nomadlogic.org>> wrote:
> 
> 
> 
>     On 10/13/23 7:34 PM, Warner Losh wrote:
>      >
> 
>      >
>      >     the messages i posted in the start of the thread are from the
>     VM itself
>      >     (13.2-RELEASE).  The zpool on the hypervisor (13.2-RELEASE)
>     showed no
>      >     such issues.
>      >
>      >     Based on your comment about the improvements in 14 I'll focus my
>      >     efforts
>      >     on my workstation, it seemed to happen regularly so hopefully
>     i can
>      >     find
>      >     a repo case.
>      >
>      >
>      > Let me now if you see similar messages in stable/14. I think I've
>     fixed
>      > all the
>      > issues with timeouts, though you shouldn't ever seem them in a vm
>     setup
>      > unless something else weird is going on.
>      >
> 
> 
>     Hi Warner, just resurfacing this thread because I've had a few lockups
>     on my workstation running 14.0-STABLE.  I was able to capture a
>     photo of
>     the hang and this seems to be the most important line:
> 
>     nvme0: Resetting controller due to a timeout and possible hot unplug.
> 
>     When I scan the device after reboot I don't see any errors, but if
>     there
>     is a particular thing I should check via nvmecontrol please let me
>     know.
>        Also, since it mentions possible hot unplug I wonder if this is
>     hardware/firmware related to my system?
> 
>     Anyway, haven't found a repro case yet but it has locked up a few times
>     the past two weeks.
> 
> 
> What the message means is that (a) we stopped getting interrupts from 
> the device and (b) when we went to check on the status of the device it 
> read back like missing hardware.
> 
> So is this from inside the VM running under bhyve, or in the host that's 
> hosting the VM? We have different next steps depending on where it is.
> 

OK awesome thanks for that context, so this is on a bare metal workstation.

-pete


-- 
Pete Wright
pete@nomadlogic.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a3742599-7e42-498f-861f-7023e68a3877>