From owner-freebsd-stable@FreeBSD.ORG  Wed Nov  7 23:23:50 2007
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D9EED16A468
	for <freebsd-stable@freebsd.org>; Wed,  7 Nov 2007 23:23:50 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3])
	by mx1.freebsd.org (Postfix) with ESMTP id C359113C49D
	for <freebsd-stable@freebsd.org>; Wed,  7 Nov 2007 23:23:50 +0000 (UTC)
	(envelope-from jdc@parodius.com)
Received: by mx01.sc1.parodius.com (Postfix, from userid 1000)
	id 5F4741CC079; Wed,  7 Nov 2007 15:23:28 -0800 (PST)
Date: Wed, 7 Nov 2007 15:23:28 -0800
From: Jeremy Chadwick <koitsu@FreeBSD.org>
To: freebsd-stable@freebsd.org
Message-ID: <20071107232328.GA1678@eos.sc1.parodius.com>
References: <20071107191611.GA1400@eos.sc1.parodius.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20071107191611.GA1400@eos.sc1.parodius.com>
User-Agent: Mutt/1.5.16 (2007-06-09)
Subject: Re: RELENG_6 kernel panic + savecore(8) problem
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Nov 2007 23:23:50 -0000

On Wed, Nov 07, 2007 at 11:16:11AM -0800, Jeremy Chadwick wrote:
> Tracing pid 3 tid 100001 td 0xc7c6ad80
> kdb_enter(3228441820,3228796672,3228487817,3867634632,256,...) at kdb_enter+48
> panic(3228487817,3426817152,256,3228643296,0,...) at panic+206
> handle_written_inodeblock(3459887104,3688934424,3226775710,3228787204,3228175693,...) at handle_written_inodeblock+1503
> softdep_disk_write_complete(3688934424,3227842097,3356275348,3867634836,3226342800,...) at softdep_disk_write_complete+241
> bufdone(3688934424,0,3867634856,3226352850,3356275348,...) at bufdone+126
> g_vfs_done(3356275348,0,0,3352445440,3355957180) at g_vfs_done+198
> biodone(3356275348,3228786984,588,3228423470,100,...) at biodone+178
> g_io_schedule_up(3351686528,76,3351679512,3226344072,3867634980,...) at g_io_schedule_up+137
> g_up_procbody(0,3867635000,0,0,0,...) at g_up_procbody+122
> fork_exit(3226344072,0,3867635000) at fork_exit+122
> fork_trampoline() at fork_trampoline+8

A follow-up to this:

It appears that somehow a few of the filesystems on the disk (it's a
single-disk system) were suffering from some bizarre form of soft update
corruption.

I csup'd + rebuilt/reinstalled kernel + world on the box.  Upon reboot,
I saw that a few of the filesystems were reporting errors on mount
and unmount:

/var: mount pending error: blocks 16 files 2
/home: mount pending error: blocks 3904 files 6
/home: unmount pending error: blocks 848 files 0

I dropped back into single user and did manual fsck's of all the
filesystems.  /tmp (somehow) and /var were still marked dirty, but had
no other problems.  /home did have problems.

Numerous reference count problems, ditto with some unrefs which required
dumping some partial data into lost+found.  There was also a single
instance of a "unexpected soft update inconsistency", although that may
have been induced by the panic.  Thankfully we do backups, so the user
won't lose anything.  The physical disk itself appears OK (looking at
SMART data, and a dd of the full disk had no I/O errors during reading).

I don't think any of this could explain the savecore(8) issue, since
savecore claimed there was no core to save.  But I did want to follow-
up on this so that it wasn't a mailing list thread left hanging.  :-)

If the issue crops up again, I'll likely be replacing the disk (as a
precaution) and rebuilding all the filesystems from scratch.

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |