Date: Sat, 17 Jul 2021 14:36:22 +0100 From: doa379 <doa379@gmail.com> To: Graham Perrin <grahamperrin@gmail.com> Cc: Current FreeBSD <freebsd-current@freebsd.org> Subject: Re: nvme(4) losing control, and subsequent use of fsck_ffs(8) with UFS Message-ID: <YPLc1l8tq15cFcBq@void> In-Reply-To: <994d22b5-c8b7-1183-8198-47b8251e896e@gmail.com> References: <994d22b5-c8b7-1183-8198-47b8251e896e@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> When the file system is stress-tested, it seems that the device (an internal > drive) is lost. > > A recent photograph: > > <https://photos.app.goo.gl/wB7gZKLF5PQzusrz7> > > Transcribed manually: > > nvme0: Resetting controller due to a timeout. > nvme0: resetting controller > nvme0: controller ready did not become 0 within 5500 ms > nvme0: failing outstanding i/o > nvme0: WRITE sqid:2 cid:115 nsid:1 lba:296178856 len:64 > nvme0: ABORTED - BY REQUEST (00/07) sqid:2 cid:115 cdw0:0 > g_vfs_done():nvd0p2[WRITE(offset=151370924032, length=32768)]error = 6 > UFS: forcibly unmounting /dev/nvd0p2 from / > nvme0: failing outstanding i/o > > … et cetera. > > Is this a sure sign of a hardware problem? Or must I do something special to > gain reliability under stress? > > I don't how to interpret parts of the manual page for nvme(4). There's > direction to include this line in loader.conf(5): > > nvme_load="YES" > > – however when I used kldload(8), it seemed that the module was already > loaded, or in kernel. > > Using StressDisk: > > <https://github.com/ncw/stressdisk> > > – failures typically occur after around six minutes of testing. > > The drive is very new, less than 2 TB written: > > <https://bsd-hardware.info/?probe=7138e2a9e7&log=smartctl> > > I do suspect a hardware problem, because two prior installations of Windows > 10 became non-bootable. > > Also: I find peculiarities with use of fsck_ffs(8), which I can describe > later. Maybe to be expected, if there's a problem with the drive. > > I have a similar issue with a system that runs off a USB drive. The fs is UFS. The system does minimal disk io but the system fails without warning on repeated intervals. The disk controller gets disconnected thereby taking the whole system offline. I'm sure the drive itself is not perfect but I'd have expected the fs to account for that.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YPLc1l8tq15cFcBq>