From owner-freebsd-hackers@FreeBSD.ORG Thu May 8 16:53:53 2003 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D0C6B37B401 for ; Thu, 8 May 2003 16:53:53 -0700 (PDT) Received: from internetDog.org (CPE00010230ac1b-CM014490005040.cpe.net.cable.rogers.com [24.102.167.64]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7F4F943F3F for ; Thu, 8 May 2003 16:53:52 -0700 (PDT) (envelope-from alih@internetDog.org) Received: from alih by internetDog.org with local (Exim 3.12 #1 (Debian)) id 19DvDG-0003w2-00 for ; Thu, 08 May 2003 19:54:10 -0400 Date: Thu, 8 May 2003 19:54:10 -0400 From: Ali Bahar To: freebsd-hackers@freebsd.org Message-ID: <20030508195410.A670@internetDog.org> Mail-Followup-To: freebsd-hackers@freebsd.org References: <20030508150341.B28906@internetDog.org> <1789.1052421172@critter.freebsd.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <1789.1052421172@critter.freebsd.dk>; from phk@phk.freebsd.dk on Thu, May 08, 2003 at 09:12:52PM +0200 Subject: Re: cache_purge > cache_zap segmentation fault X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: alih@internetDog.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 May 2003 23:53:54 -0000 On Thu, May 08, 2003 at 09:12:52PM +0200, Poul-Henning Kamp wrote: > In message <20030508150341.B28906@internetDog.org>, Ali Bahar writes: > If you look at the definition in sys/sys/vnode.h, it actually is pretty I have. It's the "from us"/"to us" which were unclear: > LIST_HEAD(, namecache) v_cache_src; /* c Cache entries from us */ > TAILQ_HEAD(, namecache) v_cache_dst; /* c Cache entries to us */ > So imagine this name cache entry: > > { > directory vnode: vnode of "/usr/src" [vnode#1] > name "sys" > destination vnode: vnode of "/usr/src/sys" [vnode#2] > } So this must be a namecache for /usr/src/sys. > This name cache entry will be a member of the "v_cache_src" LIST from > vnode#1, and of the v_cache_dst TAILQ for vnode#2 > > Other entries on vnode#1's ..._src LIST will be for /usr/src/bin, > /usr/src/etc and so on. > The thing you have to remember is that one vnode can have multiple names > due to hard links. If it could have only one, the TAILQ would not > be necessary. I believe I understand. So a 'destination chain' (ie TAILQ/v_cache_dst) lists all the names which a file (or dir) may be known as. Then, judging by the definition of 'struct namecache' and cache_purge, there are 3 name cache chains: hash, source and destination. I don't understand why separate source and destination system-wide chains are needed. I'd have expected that a purge/reclaim would need only delete the namecache from a single global list. That is, for each destination namecache I find for the recycled vnode, I'll be deleting only one namecache node in a global chain. > >Would you know if the 5.0 modifications could fix this problem? > > I'm not sure I know what the problem is, I just stumbled on your > email midthread I think... The relevant posts were done this past Sunday and Wednesday. In brief, I get a seg fault in getnewvnode > cache_purge(v_cache_dst) > cache_zap(nc_src) because nc_src.le_next has a junk value, which gets de-referenced. The crash happens in various processes at various times. It started about 3 weeks ago, and has been increasing in frequency. We're adding some networking modules to the kernel, and there've been quite a few resulting crashes of the box! :-) Considering its increasing frequency, I even suspected that the filesystem had been corrupted -- in a way undetected by fsck. But, a 'normal' filesystem corruption exhibits _random_ crashes, not ones consistently following the above execution thread. However, your mention of hard links makes me wonder. I thought hard links were rare. Are they prevalent in a FreeBSD/unix OS tree? (I'll look this up.) If so, then maybe the underlying filesystem objects have been corrupted by all the crashes? ... Nah, the files accessed when the seg fault occurs, are often temporary files (eg .o files during compilation). Thanks much for your help. Much appreciated. regards, ali -- Jesus was an Arab.