Date: Sun, 09 Sep 2007 18:42:07 +0200 From: Johannes Totz <jo_t@gmx.net> To: freebsd-fs@freebsd.org Subject: UFS not handling errors correctly Message-ID: <46E4225F.1020806@gmx.net>
next in thread | raw e-mail | index | archive | help
Hi! Seems like UFS does not handle disk/write errors properly, causes silent corruptions and which causes a panic later during snapshot creation. > #uname -a > FreeBSD alfred 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Jul 12 20:40:55 CEST 2007 root@alfred:/usr/obj/usr/src/sys/ALFRED i386 One day a write error on one of my disks happened: > Aug 22 05:24:39 alfred kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=469004995 > Aug 22 05:24:40 alfred kernel: ad0: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=469004995 > Aug 22 05:24:40 alfred kernel: g_vfs_done():ufs/home[READ(offset=240130525184, length=2048)]error = 5 > Aug 22 05:25:08 alfred kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=490974155 > Aug 22 05:25:08 alfred kernel: ad0: FAILURE - READ_DMA48 status=51<READY,DSC,ERROR> error=10<NID_NOT_FOUND> LBA=490974155 > Aug 22 05:25:08 alfred kernel: g_vfs_done():ufs/home[READ(offset=251378735104, length=2048)]error = 5 This has never happened before and did not happen again (yet). A long test with smartctl reports "all fine". So lets attribute that to a cosmic ray (or neutrino, pick your favorite) hitting the controller. The system continued to run fine afterwards. But: next morning during automatic snapshot creation it panic'ed with: > Aug 23 06:38:14 alfred kernel: ffs_snapshot_mount: old format snapshot inode 8 > Aug 23 06:38:14 alfred savecore: reboot after panic: snapacct_ufs2: bad block So of course it restarted. And tried to do a background fsck. And failed again... and again... and again... > Aug 23 07:08:15 alfred kernel: ffs_snapshot_mount: old format snapshot inode 4 > Aug 23 07:08:15 alfred savecore: reboot after panic: snapacct_ufs2: bad block The report inode varies but "bad block" is always the same. So this went on for about 10x until I had a chance to interrupt it (i.e. woke from slumber) and boot into single user mode. Multiple runs of fsck fixed the problem. Deleted all old snapshot files and system is fine. No further problems. Maybe some files got lost; can't tell, there are a few million on it. Also see: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/114676 Unfortunately I don't have time to dig into this. But I wanted to report it. Maybe someone already fixed it... Cheers, Johannes
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?46E4225F.1020806>