Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 05 Jul 2022 23:04:50 +0000
From:      bugzilla-noreply@freebsd.org
To:        fs@FreeBSD.org
Subject:   [Bug 264141] nvme(4): Heavy load to SSD wedges 13.1 system: Controller in fatal status, resetting ... Resetting controller due to a timeout and possible hot unplug.
Message-ID:  <bug-264141-3630-q7k7ZkIw2L@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-264141-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-264141-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D264141

--- Comment #23 from Warner Losh <imp@FreeBSD.org> ---
(In reply to dgilbert from comment #22)
> theory: FreeBSD is stomping on the host DRAM reserved for the NVME

There's no host ram reserved for nvme, per se. The driver will optionally
allocate memory for the drive to use, however. Do you have "nvmeX: Allocated
%lluMB host memory buffer" in your dmesg? Without it, you're not using nvme
memory. You can set the tunable hw.nvme.hmb_max=3D0 as well to disable usin=
g host
memory for the DRAM-less cards at the cost of some additional latency if you
think that this is the cause of the problem. This would rule it out as a
problem. There may be some cards that lose their minds when this is enabled=
 as
well, though I've not seen reports of that in Linux world (I could easily h=
ave
missed them). Ruling this in/out would be useful...

But corrupting host memory seems unlikely to be a cause given that the card
drops off the bus and has its memory BARs reset so it isn't decoding anythi=
ng
(which is what's indicated by the possible hotplug messages). This indicates
some kind of power or connection issue to the card, a faulty power controll=
er
on the card or wonky firmware in the cases that I've diagnosed. There might=
 be
a possible additional cause that's still unknown, but absent better evidence
I'm at a loss for where to look.

--=20
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-264141-3630-q7k7ZkIw2L>