Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 May 2022 05:50:25 +0000
From:      bugzilla-noreply@freebsd.org
To:        fs@FreeBSD.org
Subject:   [Bug 264141] nvme(4): Heavy load to SSD wedges 13.1 system: Controller in fatal status, resetting ... Resetting controller due to a timeout and possible hot unplug.
Message-ID:  <bug-264141-3630-SHxtV26ZXV@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-264141-3630@https.bugs.freebsd.org/bugzilla/>
References:  <bug-264141-3630@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D264141

--- Comment #8 from crb <crb@ChrisBowman.com> ---
Replacing nvme with nda results in similar looking messages from both nvme0=
 and
nda0 (theses didn't show up in a remote ssh session so that I could cut and
paste them).

I don't think the cards get to hot.  The machine has 3 fans that spin up wi=
th
cpu temperature and as I mentioned earlier the card has a heat sync.  When I
link while building world with 32 jobs I do hear the fans ramp ever so slig=
htly
but mostly they're quiet.

I doubt it's cabling as these SSDs were directly inserted in to an M2 slot =
and
I seated the last one securely a few days ago.

It could be power, this is a bit of a hacked system (I gutted a Sun Ultra 40
and replaced the contents with this reusing the power supply) but I don't h=
ave
a way to eliminate power as a possibility right now.  Theoretically this sy=
stem
should be able to deliver 1000W and I only have the motherboard, processor,=
 64
G memory, the SSD, 2 ethernet cards (one a Mellanox CX3 using fiber) and 6
spinning drives which are basically quiet.  Power seems unlikely as the sys=
tem
seems otherwise rock solid with load except when hitting the SSD hard.

This (unfortunately) seems to be completely repeatable now simple by copyin=
g a
couple of repo over 10G ether from a remote nfs machine to the local SSD wh=
ile
the machine is otherwise completely idle.

--=20
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-264141-3630-SHxtV26ZXV>