From owner-freebsd-fs@FreeBSD.ORG Sun Sep 9 17:07:35 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B86B016A419 for ; Sun, 9 Sep 2007 17:07:35 +0000 (UTC) (envelope-from jo_t@gmx.net) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by mx1.freebsd.org (Postfix) with SMTP id 2283C13C465 for ; Sun, 9 Sep 2007 17:07:34 +0000 (UTC) (envelope-from jo_t@gmx.net) Received: (qmail invoked by alias); 09 Sep 2007 16:40:53 -0000 Received: from wh58-703.st.Uni-Magdeburg.DE (EHLO [192.168.73.100]) [141.44.198.73] by mail.gmx.net (mp045) with SMTP; 09 Sep 2007 18:40:53 +0200 X-Authenticated: #2964489 X-Provags-ID: V01U2FsdGVkX1+Y4zwsuKyYcx38LqxE0y8ECndXaHzqs6kxCSZdQ9 GKddYfUrK4ZHe1 Message-ID: <46E4225F.1020806@gmx.net> Date: Sun, 09 Sep 2007 18:42:07 +0200 From: Johannes Totz User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070728 Thunderbird/2.0.0.6 Mnenhy/0.7.5.666 MIME-Version: 1.0 To: freebsd-fs@freebsd.org X-Enigmail-Version: 0.95.3 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 Subject: UFS not handling errors correctly X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Sep 2007 17:07:35 -0000 Hi! Seems like UFS does not handle disk/write errors properly, causes silent corruptions and which causes a panic later during snapshot creation. > #uname -a > FreeBSD alfred 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Jul 12 20:40:55 CEST 2007 root@alfred:/usr/obj/usr/src/sys/ALFRED i386 One day a write error on one of my disks happened: > Aug 22 05:24:39 alfred kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=469004995 > Aug 22 05:24:40 alfred kernel: ad0: FAILURE - READ_DMA48 status=51 error=10 LBA=469004995 > Aug 22 05:24:40 alfred kernel: g_vfs_done():ufs/home[READ(offset=240130525184, length=2048)]error = 5 > Aug 22 05:25:08 alfred kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=490974155 > Aug 22 05:25:08 alfred kernel: ad0: FAILURE - READ_DMA48 status=51 error=10 LBA=490974155 > Aug 22 05:25:08 alfred kernel: g_vfs_done():ufs/home[READ(offset=251378735104, length=2048)]error = 5 This has never happened before and did not happen again (yet). A long test with smartctl reports "all fine". So lets attribute that to a cosmic ray (or neutrino, pick your favorite) hitting the controller. The system continued to run fine afterwards. But: next morning during automatic snapshot creation it panic'ed with: > Aug 23 06:38:14 alfred kernel: ffs_snapshot_mount: old format snapshot inode 8 > Aug 23 06:38:14 alfred savecore: reboot after panic: snapacct_ufs2: bad block So of course it restarted. And tried to do a background fsck. And failed again... and again... and again... > Aug 23 07:08:15 alfred kernel: ffs_snapshot_mount: old format snapshot inode 4 > Aug 23 07:08:15 alfred savecore: reboot after panic: snapacct_ufs2: bad block The report inode varies but "bad block" is always the same. So this went on for about 10x until I had a chance to interrupt it (i.e. woke from slumber) and boot into single user mode. Multiple runs of fsck fixed the problem. Deleted all old snapshot files and system is fine. No further problems. Maybe some files got lost; can't tell, there are a few million on it. Also see: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/114676 Unfortunately I don't have time to dig into this. But I wanted to report it. Maybe someone already fixed it... Cheers, Johannes