From owner-freebsd-fs@FreeBSD.ORG Sun Sep 9 17:27:50 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24B1B16A420 for ; Sun, 9 Sep 2007 17:27:50 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx1.freebsd.org (Postfix) with ESMTP id 6867713C442; Sun, 9 Sep 2007 17:27:49 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <46E42D14.5060605@FreeBSD.org> Date: Sun, 09 Sep 2007 19:27:48 +0200 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Johannes Totz References: <46E4225F.1020806@gmx.net> In-Reply-To: <46E4225F.1020806@gmx.net> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: UFS not handling errors correctly X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Sep 2007 17:27:50 -0000 Johannes Totz wrote: > Hi! > > Seems like UFS does not handle disk/write errors properly, causes silent > corruptions and which causes a panic later during snapshot creation. > >> #uname -a >> FreeBSD alfred 6.2-STABLE FreeBSD 6.2-STABLE #0: Thu Jul 12 20:40:55 CEST 2007 root@alfred:/usr/obj/usr/src/sys/ALFRED i386 > > One day a write error on one of my disks happened: > >> Aug 22 05:24:39 alfred kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=469004995 >> Aug 22 05:24:40 alfred kernel: ad0: FAILURE - READ_DMA48 status=51 error=10 LBA=469004995 >> Aug 22 05:24:40 alfred kernel: g_vfs_done():ufs/home[READ(offset=240130525184, length=2048)]error = 5 >> Aug 22 05:25:08 alfred kernel: ad0: TIMEOUT - READ_DMA48 retrying (1 retry left) LBA=490974155 >> Aug 22 05:25:08 alfred kernel: ad0: FAILURE - READ_DMA48 status=51 error=10 LBA=490974155 >> Aug 22 05:25:08 alfred kernel: g_vfs_done():ufs/home[READ(offset=251378735104, length=2048)]error = 5 > > This has never happened before and did not happen again (yet). A long > test with smartctl reports "all fine". So lets attribute that to a > cosmic ray (or neutrino, pick your favorite) hitting the controller. > > The system continued to run fine afterwards. > But: next morning during automatic snapshot creation it panic'ed with: > >> Aug 23 06:38:14 alfred kernel: ffs_snapshot_mount: old format snapshot inode 8 >> Aug 23 06:38:14 alfred savecore: reboot after panic: snapacct_ufs2: bad block > > So of course it restarted. And tried to do a background fsck. And failed > again... and again... and again... > >> Aug 23 07:08:15 alfred kernel: ffs_snapshot_mount: old format snapshot inode 4 >> Aug 23 07:08:15 alfred savecore: reboot after panic: snapacct_ufs2: bad block > > The report inode varies but "bad block" is always the same. > So this went on for about 10x until I had a chance to interrupt it (i.e. > woke from slumber) and boot into single user mode. > Multiple runs of fsck fixed the problem. Deleted all old snapshot files > and system is fine. No further problems. Maybe some files got lost; > can't tell, there are a few million on it. > > Also see: > http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/114676 > > Unfortunately I don't have time to dig into this. But I wanted to report > it. Maybe someone already fixed it... bg fsck cannot fix arbitrary filesystem corruption. Nor is it intended to. Kris