From owner-freebsd-stable@FreeBSD.ORG Mon Jan 30 03:03:30 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E991F1065670 for ; Mon, 30 Jan 2012 03:03:29 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id B4AF88FC12 for ; Mon, 30 Jan 2012 03:03:29 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.77 (FreeBSD)) (envelope-from ) id 1RrhWX-000GGb-1i; Sun, 29 Jan 2012 22:03:17 -0500 Date: Sun, 29 Jan 2012 22:03:16 -0500 From: Gary Palmer To: Peter Maloney Message-ID: <20120130030316.GB60637@in-addr.com> References: <20120127024815.GD17973@in-addr.com> <20120127030906.GA67449@icarus.home.lan> <20120127031351.GA67596@icarus.home.lan> <20120127034352.GG17973@in-addr.com> <4F2298A3.4030204@brockmann-consult.de> <20120130014138.GA60637@in-addr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120130014138.GA60637@in-addr.com> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-stable@freebsd.org Subject: Re: Panic on 7.4-RELEASE-p5 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jan 2012 03:03:30 -0000 On Sun, Jan 29, 2012 at 08:41:38PM -0500, Gary Palmer wrote: > On Fri, Jan 27, 2012 at 01:29:23PM +0100, Peter Maloney wrote: > > On 01/27/2012 04:43 AM, Gary Palmer wrote: > > > > > > After scanning selected spans, do NOT read-scan remainder of disk. > > > If Selective self-test is pending on power-up, resume after 0 minute delay. > > > > > > I noticed a while ago that there were some "bad" sectors on the disk, and > > > at the time they were under the swap partition if my math was correct, > > > and the box never swaps so it wasn't a problem. I don't know if > > > the errors above are the same ones I saw earlier or not. > > > > > > There were no read or write errors on the console prior to the panic > > > earlier today. In fact the previos output on the console relates to > > > the last reboot for a software upgrade (fixing some packages) 11 > > > days prior. The only thing in logs going back to November relating > > > to ad1 are boot messages. > > > > > > Thanks, > > > > > > Gary > > > > > > > Unmount your swap, and then write zeros to it to relocate the bad sectors. > > > > in one shell: > > gstat -I 100ms -f da#p# > > > > in another: > > swapoff /dev/da#p# > > sysctl kern.geom.debugflags=0x10 > > dd if=/dev/zero of=/dev/da#p# bs=1M > > (eventually it stops saying end of device or no space left; at this > > point I am not sure if you should then continue writing where it stopped > > in 512 byte blocks, or if it wrote a partial 1M in the last 1M) > > > > Watch first shell. If the speed goes up, settles at a certain number, > > then wildly goes down low and back up to that number, it is possibly > > working. > > > > Then repeat. If the same wild fluctuations happen, then the drive didn't > > relocate enough, because it is trying to keep some semi-bad ones, or > > they are only bad when reading. If it is just settling at a speed and > > staying there, then it is probably successful. I don't know how reliable > > it is. I have found it to be 100% reliable in my testing though. But > > some/most disks lie to you on the "relocated sector count". > > > > And then remount the swap and change that kernel parameter back. > > sysctl kern.geom.debugflags=0 > > swapon /dev/da#p# > > > > > > Your relocated sector count: > > > > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 > > > > > > > > However, this does not fix your disk. eg. If you have heads grinding the > > platter, you have dust flying around, and your disk will get worse. > > > > Be VERY careful using dd to write directly to disks. If you use the > > wrong slice, or you use the main device without slices and miscalculate, > > bad things happen. This is why that kernel parameter was set to stop you. > > Hi Peter, > > I did things a little differently. When I checked swapinfo, apparently I > set the swap partition up just purely to act as a dump device - it wasn't > used as swap. So I tested it: > > # recoverdisk /dev/ad1s1b /dev/ad1s1b > start size block-len state done remaining % done > 628097024 1040384 1040384 0 629137408 0 100.00000 > Completed > > smartctl still reports: > > 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 > > I then did a read test across the whole disk with no errors > > # recoverdisk /dev/ad1 /dev/null > start size block-len state done remaining % done > 120033640448 483328 483328 0 120034123776 0 100.00000 > Completed > > Reallocated_Sector_Ct is still the same > > I dunno where the problems are/were, but apparently I cannot hit them now > through just reading the disk or writing to swap. FYI I just ran both smartctl -t short /dev/ad1 and smartctl -t long /dev/ad1 and neither found any problems SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 33819 - # 2 Short offline Completed without error 00% 33818 - Thanks, Gary