From owner-freebsd-hackers@FreeBSD.ORG Fri Jul 26 20:20:34 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 0D2B6408 for ; Fri, 26 Jul 2013 20:20:34 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D802E2E7F for ; Fri, 26 Jul 2013 20:20:33 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id AF2B8B94C; Fri, 26 Jul 2013 16:20:32 -0400 (EDT) From: John Baldwin To: rank1seeker@gmail.com Subject: Re: UFS related panic (daily <-> find) Date: Fri, 26 Jul 2013 16:08:11 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <20130719.174511.786.3@DOMY-PC> <201307261116.58638.jhb@freebsd.org> <20130726.190033.811.1@DOMY-PC> In-Reply-To: <20130726.190033.811.1@DOMY-PC> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201307261608.11556.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 26 Jul 2013 16:20:32 -0400 (EDT) Cc: hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jul 2013 20:20:34 -0000 On Friday, July 26, 2013 3:00:33 pm rank1seeker@gmail.com wrote: > > > > > I had 2 panics: (Both occured at 3 AM, so had to be daily task) > > > > > > > > > > First (Jul 2 03:06:50 2013): > > > > > -- > > > > > Fatal trap 12: page fault while in kernel mode > > > > > fault virtual address = 0x19 > > > > > fault code = supervisor read, page not present > > > > > instruction pointer = 0x20:0xc06caf34 > > > > > stack pointer = 0x28:0xe76248fc > > > > > frame pointer = 0x28:0xe7624930 > > > > > code segment = base 0x0, limit 0xfffff, type 0x1b > > > > > = DPL 0, pres 1, def32 1, gran 1 > > > > > processor eflags = interrupt enabled, resume, IOPL = 0 > > > > > current process = 76562 (find) > > > > > trap number = 12 > > > > > panic: page fault > > > > > Uptime: 23h0m41s > > > > > Physical memory: 1014 MB > > > > > Dumping 186 MB: 171 155 139 123 107 91 75 59 43 27 11 > > > > > > > > > > #7 0xc06caf34 in cache_lookup_times (dvp=0xc784a990, > vpp=0xe7624ae8, > > > > > cnp=0xe7624afc, tsp=0x0, ticksp=0x0) at > > > > /usr/src/sys/kern/vfs_cache.c:547 > > > > > > > > Can you go up to this frame and do 'l'? > > > > > > > > -- > > > > John Baldwin > > > > > > > > > Sure, > > > > > > --------- > > > (kgdb) up 7 > > > #7 0xc06caf34 in cache_lookup_times (dvp=0xc784a990, vpp=0xe7624ae8, > cnp=0xe7624afc, tsp=0x0, ticksp=0x0) at /usr/src/sys/kern/vfs_cache.c:547 > > > 547 numchecks++; > > > --------- > > > (kgdb) l > > > 542 } > > > 543 > > > 544 hash = fnv_32_buf(cnp->cn_nameptr, cnp->cn_namelen, > FNV1_32_INIT); > > > 545 hash = fnv_32_buf(&dvp, sizeof(dvp), hash); > > > 546 LIST_FOREACH(ncp, (NCHHASH(hash)), nc_hash) { > > > 547 numchecks++; > > > 548 if (ncp->nc_dvp == dvp && ncp->nc_nlen == > cnp->cn_namelen && > > > 549 !bcmp(nc_get_name(ncp), cnp->cn_nameptr, > ncp->nc_nlen)) > > > 550 break; > > > 551 } > > > --------- > > > > Hmm, 'p ncp' and 'p *ncp' at that frame perhaps? > > > > (kgdb) p ncp > $1 = (struct namecache *) 0x1 > (kgdb) p *ncp > Cannot access memory at address 0x1 Interesting. Maybe look at NCHHASH(hash) (you'll have to expand the macro manually) and see if the head node is corrupted or walk the list to find the corrupted node. Given that it is a single bit error, there is a chance this is a RAM problem. If it is in the hash table head entry then that would always be at the same physical address for the same kernel I think. -- John Baldwin