From owner-freebsd-hackers@FreeBSD.ORG Wed May 7 10:47:12 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7798B37B401 for ; Wed, 7 May 2003 10:47:12 -0700 (PDT) Received: from internetDog.org (CPE00010230ac1b-CM014490005040.cpe.net.cable.rogers.com [24.102.167.64]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC35C43FAF for ; Wed, 7 May 2003 10:47:10 -0700 (PDT) (envelope-from alih@internetDog.org) Received: from alih by internetDog.org with local (Exim 3.12 #1 (Debian)) id 19DT0w-0003fp-00 for ; Wed, 07 May 2003 13:47:34 -0400 Date: Wed, 7 May 2003 13:47:34 -0400 From: Ali Bahar To: freebsd-hackers@freebsd.org Message-ID: <20030507134734.A12455@internetDog.org> References: <20030504113221.A27756@internetDog.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20030504113221.A27756@internetDog.org>; from alih@internetDog.org on Sun, May 04, 2003 at 11:32:21AM -0400 Subject: Re: cache_purge > cache_zap segmentation fault X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: alih@internetDog.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 May 2003 17:47:12 -0000 On Sun, May 04, 2003 at 11:32:21AM -0400, Ali wrote: > this post may be of interest to people familiar with the filesystem code. > syscall2 > open > vn_open > namei > lookup > ufs_vnoperate > > vfs_cache_lookup > ufs_vnoperate > ufs_lookup > ffs_vget > getnewvnode > > cache_purge > cache_zap The name cache is corrupted. Most of the threads involve getnewvnode, so a new file is being opened. The only thread observed to not include getnewvnode, used cache_enter. So a new cache entry is being created. I consider it a corruption because a namecache node has a junk value for nc_src.le_next . This is then de-referenced as the next namecache node, thus seg faulting. (gdb) p ncp $4 = (struct namecache *) 0xc0d62b40 (gdb) p *ncp $5 = { nc_hash = { le_next = 0x0, le_prev = 0xc0cd2ae4 }, nc_src = { le_next = 0x117, le_prev = 0xc0002a48 }, nc_dst = { tqe_next = 0x0, tqe_prev = 0xc61f9940 }, nc_dvp = 0xc61f33c0, nc_vp = 0xc61f98c0, nc_flag = 0 '\0', nc_nlen = 7 '\a', nc_name = 0xc0d62b62 "time.el\t\b[\t\bX\t\bM\t\bJ\t\b;\t\b" } As 'cache_purge > cache_zap' is involved, it may be that namecache node deletions have left a deleted node dangling. What I do not know, is whether there is a single system-wide name cache, or a per-directory cache linked list (LL). Neither the beastie book (Mckusick et al) or FreeBSD Developers' Handbook seem to cover this. Knowing the answer, would help me determine what the LLs are supposed to look like -- thereby help diagnose when the LL begins to go wrong. > P.P.S. It's been occuring intermittently, and increasingly, > recently. (Due to its increased prevalence, I even suspected that the > frequency of kernel crashes, might have corrupted the filesystem in a > way ignorable/imperceptible by fsck/me!) I no longer think so. Certainly a 'typical' filesystem corruption would lead to all sorts of random faults, not the consistent execution threads noted above. This is closer to a 'bug' than to a 'corruption'. Nonetheless, it may still be (somehow!) caused by me, rather than being a bug in the generic kernel. regards, ali -- Jesus was an Arab.