Date: Sun, 22 May 2022 05:27:29 +0000 From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 264141] nvme(4): Heavy load to SSD wedges 13.1 system: Controller in fatal status, resetting ... Resetting controller due to a timeout and possible hot unplug. Message-ID: <bug-264141-3630-bcBeelI2go@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-264141-3630@https.bugs.freebsd.org/bugzilla/> References: <bug-264141-3630@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D264141 --- Comment #7 from Warner Losh <imp@FreeBSD.org> --- nda is an alternative to nvd that uses CAM. Unless you need really high IOP= S, nda generally is better than nvd. In loader.conf, add 'hw.nvme.use_nvd=3D0' and reboot. We provide a compatible /dev/nvd* that points to /dev/nda* so almost all us= es of /dev/nvd* should work. But with zfs, chances are you won't notice. I wrote this code, but had trouble driving the nvme drives I have access too off the cliff to test all pathological behaviors. This is one I tested in simulation. However, looking at the code, I fear that this workaround likely won't help you. The message happens when we fail the controller, and that seems to be happening when reset fails (which we should report directly, but apparently don't). Do you have issues with the machines being too hot or having poor airflow o= ver the nvme cards so they get too hot? In general, FreeBSD (or any OS) shouldn= 't be able to schedule so much I/O that the card's SoC controller fails... At least not in a repeatable way across multiple drive types. The 'possible hotplug' means we read all 'f's before trying to do a reset. If the card is= n't there at all, we'll timeout and fail the controller (which maybe what's rea= lly going on). That suggests power and/or cabling issues if it isn't thermal somehow. It would be good to eliminate these possibilities if at all possib= le. --=20 You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-264141-3630-bcBeelI2go>