From owner-freebsd-stable@FreeBSD.ORG Wed Nov 7 23:23:50 2007 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D9EED16A468 for ; Wed, 7 Nov 2007 23:23:50 +0000 (UTC) (envelope-from jdc@parodius.com) Received: from mx01.sc1.parodius.com (mx01.sc1.parodius.com [72.20.106.3]) by mx1.freebsd.org (Postfix) with ESMTP id C359113C49D for ; Wed, 7 Nov 2007 23:23:50 +0000 (UTC) (envelope-from jdc@parodius.com) Received: by mx01.sc1.parodius.com (Postfix, from userid 1000) id 5F4741CC079; Wed, 7 Nov 2007 15:23:28 -0800 (PST) Date: Wed, 7 Nov 2007 15:23:28 -0800 From: Jeremy Chadwick To: freebsd-stable@freebsd.org Message-ID: <20071107232328.GA1678@eos.sc1.parodius.com> References: <20071107191611.GA1400@eos.sc1.parodius.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20071107191611.GA1400@eos.sc1.parodius.com> User-Agent: Mutt/1.5.16 (2007-06-09) Subject: Re: RELENG_6 kernel panic + savecore(8) problem X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2007 23:23:50 -0000 On Wed, Nov 07, 2007 at 11:16:11AM -0800, Jeremy Chadwick wrote: > Tracing pid 3 tid 100001 td 0xc7c6ad80 > kdb_enter(3228441820,3228796672,3228487817,3867634632,256,...) at kdb_enter+48 > panic(3228487817,3426817152,256,3228643296,0,...) at panic+206 > handle_written_inodeblock(3459887104,3688934424,3226775710,3228787204,3228175693,...) at handle_written_inodeblock+1503 > softdep_disk_write_complete(3688934424,3227842097,3356275348,3867634836,3226342800,...) at softdep_disk_write_complete+241 > bufdone(3688934424,0,3867634856,3226352850,3356275348,...) at bufdone+126 > g_vfs_done(3356275348,0,0,3352445440,3355957180) at g_vfs_done+198 > biodone(3356275348,3228786984,588,3228423470,100,...) at biodone+178 > g_io_schedule_up(3351686528,76,3351679512,3226344072,3867634980,...) at g_io_schedule_up+137 > g_up_procbody(0,3867635000,0,0,0,...) at g_up_procbody+122 > fork_exit(3226344072,0,3867635000) at fork_exit+122 > fork_trampoline() at fork_trampoline+8 A follow-up to this: It appears that somehow a few of the filesystems on the disk (it's a single-disk system) were suffering from some bizarre form of soft update corruption. I csup'd + rebuilt/reinstalled kernel + world on the box. Upon reboot, I saw that a few of the filesystems were reporting errors on mount and unmount: /var: mount pending error: blocks 16 files 2 /home: mount pending error: blocks 3904 files 6 /home: unmount pending error: blocks 848 files 0 I dropped back into single user and did manual fsck's of all the filesystems. /tmp (somehow) and /var were still marked dirty, but had no other problems. /home did have problems. Numerous reference count problems, ditto with some unrefs which required dumping some partial data into lost+found. There was also a single instance of a "unexpected soft update inconsistency", although that may have been induced by the panic. Thankfully we do backups, so the user won't lose anything. The physical disk itself appears OK (looking at SMART data, and a dd of the full disk had no I/O errors during reading). I don't think any of this could explain the savecore(8) issue, since savecore claimed there was no core to save. But I did want to follow- up on this so that it wasn't a mailing list thread left hanging. :-) If the issue crops up again, I'll likely be replacing the disk (as a precaution) and rebuilding all the filesystems from scratch. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |