Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Nov 2006 09:35:10 -0600
From:      Eric Anderson <anderson@centtech.com>
To:        Lapo Luchini <lapo@lapo.it>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Snapshot corruption on 6.1/amd64
Message-ID:  <455890AE.9050807@centtech.com>
In-Reply-To: <loom.20061113T154045-932@post.gmane.org>
References:  <854C78DB-2099-4DA5-9E3B-F30D6947C532@jlauser.net>	<4512F957.2090205@centtech.com>	<20060922041535.GF4842@deviant.kiev.zoral.com.ua> <loom.20061113T154045-932@post.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/13/06 08:55, Lapo Luchini wrote:
> Kostik Belousov <kostikbel <at> gmail.com> writes:
> 
>>>> After some searching, I've found a bug report filed last year that
>>>> describes this problem exactly, though the log of that report does
>>>> not suggest that anything has been done with it.  That report is at
>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/90512
>> James, look at the PR/100365. Supposed fix is MFCed. Original reporter
>> said that this changed nothing for him. I have not much time lately to
>> look at this problem, but would like to get additional data points.
>>
>> BTW, use of snapshots with stock 6.1 is not very attractive idea, better
>> to update to the 6-STABLE (many important fixes in that area were made).
> 
> I had a problem with snapshots too, and I also use amd64.
> The description of neither PR seem to match my case: I compiled 6.1-STABLE at
> the beginning of September and activated snapshots on the whole /usr FS and had
> no problems until the beginning of October, when:
> 
> Oct 22 04:00:33 motoko root: snapshot: daily.0 snapshot on filesystem / made
> (duration: 0 min)
> Oct 22 04:03:29 motoko root: snapshot: daily.0 snapshot on filesystem /usr made
> (duration: 2 min)
> Oct 22 04:03:47 motoko root: snapshot: daily.0 snapshot on filesystem /var made
> (duration: 0 min)
> [machine manually reset]
> Oct 23 11:09:21 motoko syslogd: kernel boot file is /boot/kernel/kernel
> Oct 23 11:09:21 motoko kernel: Copyright (c) 1992-2006 The FreeBSD Project.
> [...]
> Oct 23 11:11:02 motoko fsck: /dev/ad0s1e: 4449 files, 118197 used, 135618 free
> (8882 frags, 15842 blocks, 3.5% fragmentation)
> Oct 23 11:11:18 motoko fsck: /dev/ad0s1d: UNREF FILE I=23564  OWNER=operator
> MODE=100400
> [...many more...]
> Oct 23 11:11:19 motoko fsck: /dev/ad0s1d: UNREF FILE I=212299  OWNER=www
> MODE=100600
> Oct 23 11:11:19 motoko fsck: /dev/ad0s1d: SIZE=2048 MTIME=Oct  1 15:57 2006
> (CLEARED)
> Oct 23 11:11:19 motoko fsck: /dev/ad0s1d: Reclaimed: 0 directories, 1991 files,
> 1832 fragments
> Oct 23 11:11:19 motoko fsck: /dev/ad0s1d: 18768 files, 83120 used, 915663 free
> ( 6839 frags, 113603 blocks, 0.7% fragmentation)
> Oct 23 11:13:49 motoko ntpd[670]: kernel time sync disabled 2041
> Oct 23 11:21:10 motoko syslogd: kernel boot file is /boot/kernel/kernel
> Oct 23 11:21:10 motoko kernel: panic: snapblkfree: inconsistent block type
> Oct 23 11:21:10 motoko kernel: Uptime: 20m38s
> Oct 23 11:21:10 motoko kernel: Cannot dump. No dump device defined.
> Oct 23 11:21:10 motoko kernel: Automatic reboot in 15 seconds - press a key on
> the console to abort
> Oct 23 11:21:10 motoko kernel: Copyright (c) 1992-2006 The FreeBSD Project.
> 
> And after this the box kinda looped 27 times { fsck; panic; reset; } until it
> finally crashed for good.
> 
> I then decided to stop taking new snapshots and activate a dump device, but
> after a few days a new problem was there:
> 
> Dump header from device /dev/ad0s1b
>   Architecture: amd64
>   Architecture Version: 2
>   Dump Length: 1056505856B (1007 MB)
>   Blocksize: 512
>   Dumptime: Fri Nov  3 04:25:36 2006
>   Hostname: motoko.lapo.it
>   Magic: FreeBSD Kernel Dump
>   Version String: FreeBSD 6.1-STABLE #4: Fri Sep  1 17:02:50 CEST 2006
>     root@motoko.lapo.it:/usr/obj/usr/src/sys/MOTOKO
>   Panic String: snapacct_ufs2: bad block
>   Dump Parity: 2648692799
>   Bounds: 1
>   Dump Status: good
> 
> I solved this removing any existing snapshot, but at this time I had accumulated
> enough downtime and frustration (and angry users) not to want to try snapshots
> anymore unless I had some strong impression the problem could really have been
> solved, which kinda explains why I noticed this thread... the obvious question
> is: may this problem be resolved by PR/100365 (seems quite different to me, but
> I don't know the internals...) or is it a new thing?
> 
> I have the dump file, for the latest problem.


Maybe you have a bad disk?  You might try swapping the drive out. (just 
a wild guess here)


Eric



-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?455890AE.9050807>