Date: Tue, 31 Oct 2006 16:03:21 +0200 From: "Vlad Galu" <dudu@dudu.ro> To: freebsd-stable@freebsd.org, freebsd-fs@freebsd.org Subject: Re: Frequent VFS crashes with RELENG_6 Message-ID: <ad79ad6b0610310603t7f00cd0ejdb3e4082466cd8a3@mail.gmail.com> In-Reply-To: <200610010015.k910F6Ba001594@cwsys.cwsent.com> References: <dudu@dudu.ro> <ad79ad6b0609300901q4215c809ye28fd861007494da@mail.gmail.com> <200610010015.k910F6Ba001594@cwsys.cwsent.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/1/06, Cy Schubert <Cy.Schubert@spqr.komquats.com> wrote:
> In message <ad79ad6b0609300901q4215c809ye28fd861007494da@mail.gmail.com>,
> "Vlad
>  GALU" writes:
> > On 9/30/06, Martin Blapp <mb@imp.ch> wrote:
> > >
> > > Hi,
> > >
> > > 1.) Bad ram ? Have you run some memory tester ?
> >
> >    Yes, memtest86 didn't show anything weird.
> >
> > > 2.) Have you background fsck running on this disk ? If
> > > so try to boot into single user and do a full fsck on this
> > > disk.
> > >
> >
> >    I have background_fsck="NO" in rc.conf and I checked the whole disk
> > several times.
> >    Something I forgot to mention earlier: the crash is easier to
> > reproduce when running rtorrent. The machine did crash without running
> > it as well, but far more seldom.
>
> I've been experiencing the same problem as well. I discovered that the disk on which the filesystem was had some bad sectors causing dump -0Lauf to fail while taking snapshot causing the system to panic. Running smartctl on the device indicated that there were bad sectors 40% within the surface scan being performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB Western Digital (for a very good price, so good a price I purchased two of them). It was 906 days old, having only been powered off maybe a dozen times over the last three years.
     During the last 2 weeks I ran the same system with WITNESS turned
on. The fact that the purpose of this machine is not I/O dependant
allowed me to run bonnie++ and iozone every second day for the whole
24 hours. At the same time I ran several instances of rtorrent. This
morning I rebooted to a non-WITNESS kernel (the same sources from 2
weeks ago) and the exact same crash occured within a few hours from
bootup. In all this time, smartd didn't report anything suspicious.
WITNESS only reported a LOR related to kqueue that is already known.
     Any ideas for further stresstesting would be welcome. I am
familiar with a few parts of the kernel, but VFS is a total stranger
to me.
-- 
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ad79ad6b0610310603t7f00cd0ejdb3e4082466cd8a3>
