From owner-freebsd-fs@FreeBSD.ORG Tue Oct 31 14:30:33 2006 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1EDB016A4DA; Tue, 31 Oct 2006 14:30:33 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh1.centtech.com (moat3.centtech.com [64.129.166.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id C3F9943D46; Tue, 31 Oct 2006 14:30:19 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh1.centtech.com (8.13.8/8.13.8) with ESMTP id k9VETxfE068235; Tue, 31 Oct 2006 08:29:59 -0600 (CST) (envelope-from anderson@centtech.com) Message-ID: <45475DEA.2030506@centtech.com> Date: Tue, 31 Oct 2006 08:30:02 -0600 From: Eric Anderson User-Agent: Thunderbird 1.5.0.7 (X11/20061015) MIME-Version: 1.0 To: Vlad Galu References: <200610010015.k910F6Ba001594@cwsys.cwsent.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.88.4/2133/Tue Oct 31 04:42:29 2006 on mh1.centtech.com X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=8.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.6 X-Spam-Checker-Version: SpamAssassin 3.1.6 (2006-10-03) on mh1.centtech.com Cc: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Frequent VFS crashes with RELENG_6 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Oct 2006 14:30:33 -0000 On 10/31/06 08:03, Vlad Galu wrote: > On 10/1/06, Cy Schubert wrote: >> In message , >> "Vlad >> GALU" writes: >>> On 9/30/06, Martin Blapp wrote: >>>> Hi, >>>> >>>> 1.) Bad ram ? Have you run some memory tester ? >>> Yes, memtest86 didn't show anything weird. >>> >>>> 2.) Have you background fsck running on this disk ? If >>>> so try to boot into single user and do a full fsck on this >>>> disk. >>>> >>> I have background_fsck="NO" in rc.conf and I checked the whole disk >>> several times. >>> Something I forgot to mention earlier: the crash is easier to >>> reproduce when running rtorrent. The machine did crash without running >>> it as well, but far more seldom. >> I've been experiencing the same problem as well. I discovered that the disk on which the filesystem was had some bad sectors causing dump -0Lauf to fail while taking snapshot causing the system to panic. Running smartctl on the device indicated that there were bad sectors 40% within the surface scan being performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB Western Digital (for a very good price, so good a price I purchased two of them). It was 906 days old, having only been powered off maybe a dozen times over the last three years. > > During the last 2 weeks I ran the same system with WITNESS turned > on. The fact that the purpose of this machine is not I/O dependant > allowed me to run bonnie++ and iozone every second day for the whole > 24 hours. At the same time I ran several instances of rtorrent. This > morning I rebooted to a non-WITNESS kernel (the same sources from 2 > weeks ago) and the exact same crash occured within a few hours from > bootup. In all this time, smartd didn't report anything suspicious. > WITNESS only reported a LOR related to kqueue that is already known. > Any ideas for further stresstesting would be welcome. I am > familiar with a few parts of the kernel, but VFS is a total stranger > to me. > > Did you get a crash dump? If not, you might want to start with adding all the debugger options into the kernel. Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------