Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Aug 2009 19:59:34 -0400
From:      Boris Kochergin <spawk@acm.poly.edu>
To:        freebsd-fs@freebsd.org
Subject:   Re: geom_mirror/UFS weirdness with 7.2-STABLE
Message-ID:  <4A947AE6.7070401@acm.poly.edu>
In-Reply-To: <h42nos$buu$1@ger.gmane.org>
References:  <4A646DA8.2050201@acm.poly.edu> <h42nos$buu$1@ger.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Ivan Voras wrote:
> Boris Kochergin wrote:
>   
>> Ahoy. I noticed some very odd things in my file server's kernel buffer
>> this morning (there were actually a ton of these--this is a snippet):
>>
>> Jul 20 05:54:10 exodus smartd[763]: Device: /dev/ad1, FAILED SMART
>> self-check. BACK UP DATA NOW!
>> Jul 20 05:57:57 exodus kernel:
>> g_vfs_done():mirror/boots1[READ(offset=-4569735194538825728,
>> length=16384)]error = 5
>> Jul 20 05:57:57 exodus kernel: bad block 8806809555123731765, ino 4430620
>> Jul 20 05:57:57 exodus kernel: pid 35 (softdepflush), uid 0 inumber
>> 4430620 on /: bad block
>>     
>
>   
>> # df /
>> Filesystem         1K-blocks                 Used               Avail
>> Capacity  Mounted on
>> /dev/mirror/boots1  37846636 -4058799239201906816 4058799239236725722
>> -11656883301279%    /
>>
>> The system is a:
>>
>> # uname -a
>> FreeBSD exodus.poly.edu 7.2-STABLE FreeBSD 7.2-STABLE #3: Sat Jul 11
>> 16:22:02 EDT 2009     root@exodus.poly.edu:/usr/obj/usr/src/sys/EXODUS 
>> amd64
>>
>> Regarding smartd yelling at me about /dev/ad1, it's been doing that for
>> long while before this. There is one sector on the drive that cannot be
>> read, but the disk has otherwise been fine for months. My experience
>> with geom_mirror has been that it disconnects members from an array if
>> they experience I/O errors, so this seems to be something different. Any
>> clues?
>>     
>
> It looks like the drive returned corrupted data without returning an
> error - which is strange, but not impossible. You are probably seeing
> numbers like -4058799239201906816 because some metadata is corrupted. If
> so, you should immediately disconnect the problematic drive so that the
> errorneous data isn't picked up and written to the good drive.
>
>
>   
In retrospect, it appears to have been bad RAM. The symptoms were just 
subtler back then.

-Boris



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A947AE6.7070401>