From owner-freebsd-current Wed Feb 19 20:44:41 2003 Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 92B3637B401 for ; Wed, 19 Feb 2003 20:44:39 -0800 (PST) Received: from heron.mail.pas.earthlink.net (heron.mail.pas.earthlink.net [207.217.120.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id E8A5C43F3F for ; Wed, 19 Feb 2003 20:44:38 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0298.cvx22-bradley.dialup.earthlink.net ([209.179.199.43] helo=mindspring.com) by heron.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18liZX-00030T-00; Wed, 19 Feb 2003 20:44:36 -0800 Message-ID: <3E545CD8.184323A9@mindspring.com> Date: Wed, 19 Feb 2003 20:43:04 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Craig Boston Cc: Lars Eggert , current@freebsd.org, Poul-Henning Kamp Subject: Re: panic starting gnome References: <3E52BB14.2040309@isi.edu> <3E532F61.653A09B0@mindspring.com> <3E5408B0.9030300@isi.edu> <1045713737.612.22.camel@localhost> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a4ea4d192f22cf40f155bb15043fc22b31667c3043c0873f7e350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Craig Boston wrote: > Well, I haven't had much luck tracking down the exact cause. For some > reason I haven't been able to figure out, all of my crash dumps jump > directly from vn_open_cred (line 185 of vfs_vnops.c) to calltrap(). The > namei call doesn't show up in the stack at all, almost like the function > is being inlined. I'm only using -O, which shouldn't inline anything > not explicitly declared as such. Nope. The problem is a NULL pointer dereference, apparently into the proc structure, which is a NULL proc pointer. > Anyway, using a cvsup binary search I've managed to narrow it down > some. The problem did not exist before midnight UTC on 2003-04-15. It > does exist on midnight UTC 2003-04-16. I've been digging through the > commit logs for that day, but it seems it was a busy day for the VFS > code with lots of commits. Since it always happens after an fdfree(), > I'm leaning toward a large (number of files) commit by alfred@ having to > do with a lock order reversal and adding a mutex associated with freeing > filedesc structures. Just a guess, though. FWIW, I arrived at the same place, given Lars' debugging information, though it was only my most likely suspect. There are some changes that went in for KSE, as well, but I'm pretty sure they were after last Wednesday. > Reproducing the problem seems to be as simple as killing any process > that has an open, locked file on an NFS volume. A simple > > gconfd-1 & > sleep 5; killall -9 gconfd-1 > > does it every time for me. I assume this would also happen if a process > calls exit() without closing all of it's fds first; probably why > starting GNOME or booting diskless is enough to tickle it. Yes, this is most likely. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message