From owner-freebsd-fs@FreeBSD.ORG Mon Nov 13 15:35:18 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0F27816A500 for ; Mon, 13 Nov 2006 15:35:18 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh1.centtech.com (moat3.centtech.com [64.129.166.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 90BED43D70 for ; Mon, 13 Nov 2006 15:35:13 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh1.centtech.com (8.13.8/8.13.8) with ESMTP id kADFZ6d9030096; Mon, 13 Nov 2006 09:35:07 -0600 (CST) (envelope-from anderson@centtech.com) Message-ID: <455890AE.9050807@centtech.com> Date: Mon, 13 Nov 2006 09:35:10 -0600 From: Eric Anderson User-Agent: Thunderbird 1.5.0.7 (X11/20061015) MIME-Version: 1.0 To: Lapo Luchini References: <854C78DB-2099-4DA5-9E3B-F30D6947C532@jlauser.net> <4512F957.2090205@centtech.com> <20060922041535.GF4842@deviant.kiev.zoral.com.ua> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.88.4/2190/Mon Nov 13 03:31:57 2006 on mh1.centtech.com X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=8.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.6 X-Spam-Checker-Version: SpamAssassin 3.1.6 (2006-10-03) on mh1.centtech.com Cc: freebsd-fs@freebsd.org Subject: Re: Snapshot corruption on 6.1/amd64 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Nov 2006 15:35:18 -0000 On 11/13/06 08:55, Lapo Luchini wrote: > Kostik Belousov gmail.com> writes: > >>>> After some searching, I've found a bug report filed last year that >>>> describes this problem exactly, though the log of that report does >>>> not suggest that anything has been done with it. That report is at >>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/90512 >> James, look at the PR/100365. Supposed fix is MFCed. Original reporter >> said that this changed nothing for him. I have not much time lately to >> look at this problem, but would like to get additional data points. >> >> BTW, use of snapshots with stock 6.1 is not very attractive idea, better >> to update to the 6-STABLE (many important fixes in that area were made). > > I had a problem with snapshots too, and I also use amd64. > The description of neither PR seem to match my case: I compiled 6.1-STABLE at > the beginning of September and activated snapshots on the whole /usr FS and had > no problems until the beginning of October, when: > > Oct 22 04:00:33 motoko root: snapshot: daily.0 snapshot on filesystem / made > (duration: 0 min) > Oct 22 04:03:29 motoko root: snapshot: daily.0 snapshot on filesystem /usr made > (duration: 2 min) > Oct 22 04:03:47 motoko root: snapshot: daily.0 snapshot on filesystem /var made > (duration: 0 min) > [machine manually reset] > Oct 23 11:09:21 motoko syslogd: kernel boot file is /boot/kernel/kernel > Oct 23 11:09:21 motoko kernel: Copyright (c) 1992-2006 The FreeBSD Project. > [...] > Oct 23 11:11:02 motoko fsck: /dev/ad0s1e: 4449 files, 118197 used, 135618 free > (8882 frags, 15842 blocks, 3.5% fragmentation) > Oct 23 11:11:18 motoko fsck: /dev/ad0s1d: UNREF FILE I=23564 OWNER=operator > MODE=100400 > [...many more...] > Oct 23 11:11:19 motoko fsck: /dev/ad0s1d: UNREF FILE I=212299 OWNER=www > MODE=100600 > Oct 23 11:11:19 motoko fsck: /dev/ad0s1d: SIZE=2048 MTIME=Oct 1 15:57 2006 > (CLEARED) > Oct 23 11:11:19 motoko fsck: /dev/ad0s1d: Reclaimed: 0 directories, 1991 files, > 1832 fragments > Oct 23 11:11:19 motoko fsck: /dev/ad0s1d: 18768 files, 83120 used, 915663 free > ( 6839 frags, 113603 blocks, 0.7% fragmentation) > Oct 23 11:13:49 motoko ntpd[670]: kernel time sync disabled 2041 > Oct 23 11:21:10 motoko syslogd: kernel boot file is /boot/kernel/kernel > Oct 23 11:21:10 motoko kernel: panic: snapblkfree: inconsistent block type > Oct 23 11:21:10 motoko kernel: Uptime: 20m38s > Oct 23 11:21:10 motoko kernel: Cannot dump. No dump device defined. > Oct 23 11:21:10 motoko kernel: Automatic reboot in 15 seconds - press a key on > the console to abort > Oct 23 11:21:10 motoko kernel: Copyright (c) 1992-2006 The FreeBSD Project. > > And after this the box kinda looped 27 times { fsck; panic; reset; } until it > finally crashed for good. > > I then decided to stop taking new snapshots and activate a dump device, but > after a few days a new problem was there: > > Dump header from device /dev/ad0s1b > Architecture: amd64 > Architecture Version: 2 > Dump Length: 1056505856B (1007 MB) > Blocksize: 512 > Dumptime: Fri Nov 3 04:25:36 2006 > Hostname: motoko.lapo.it > Magic: FreeBSD Kernel Dump > Version String: FreeBSD 6.1-STABLE #4: Fri Sep 1 17:02:50 CEST 2006 > root@motoko.lapo.it:/usr/obj/usr/src/sys/MOTOKO > Panic String: snapacct_ufs2: bad block > Dump Parity: 2648692799 > Bounds: 1 > Dump Status: good > > I solved this removing any existing snapshot, but at this time I had accumulated > enough downtime and frustration (and angry users) not to want to try snapshots > anymore unless I had some strong impression the problem could really have been > solved, which kinda explains why I noticed this thread... the obvious question > is: may this problem be resolved by PR/100365 (seems quite different to me, but > I don't know the internals...) or is it a new thing? > > I have the dump file, for the latest problem. Maybe you have a bad disk? You might try swapping the drive out. (just a wild guess here) Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------