From owner-freebsd-hackers@FreeBSD.ORG Fri Sep 11 01:34:32 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03A9C1065679 for ; Fri, 11 Sep 2009 01:34:32 +0000 (UTC) (envelope-from linda.messerschmidt@gmail.com) Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.25]) by mx1.freebsd.org (Postfix) with ESMTP id AE13F8FC12 for ; Fri, 11 Sep 2009 01:34:31 +0000 (UTC) Received: by qw-out-2122.google.com with SMTP id 3so224843qwe.7 for ; Thu, 10 Sep 2009 18:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=AcOA6yWvOzZohD7QvXPRhGb02MXgGs1sLdcUkXvI6w0=; b=ILyy6PH3CMZKLolRDXFlBsWrPcISlN0qfaTR8nPGeSvtx/C79R9wLcUDlJpecn2Yv8 LlcoRl1/mSORdDstPfnKx6vOL0SpOfznsUAR0qP36qyAWQoEf19b2RnUkK+Z72Yrl1MB W+yMpCpHbrDJxaHdCYpo6lKheRCG45EWlK4iQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=ACqCSRvS+vFOMHOGRcC9ZqKV4rJqtTuMqyvEOOOLYiICoGzF4zeZyddMoBRqVwGWda uOUSiMkgk1WK8BzdTIBMIHVQvtPyyh7U4TAyeDXGLGNOsYS2AgwEAYuDkT8HGYpFGqoz 7iQLoe0DmjhCOpRX5bw+2/XDTvAuLbfDI7t0E= MIME-Version: 1.0 Received: by 10.229.9.147 with SMTP id l19mr1146347qcl.65.1252632870963; Thu, 10 Sep 2009 18:34:30 -0700 (PDT) In-Reply-To: <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200908261642.59419.jhb@freebsd.org> <237c27100908271237y66219ef4o4b1b8a6e13ab2f6c@mail.gmail.com> <200908271729.55213.jhb@freebsd.org> <237c27100909100946q3d186af3h66757e0efff307a5@mail.gmail.com> <237c27100909101129y28771061o86db3c6a50a640eb@mail.gmail.com> <4AA94995.6030700@elischer.org> <237c27100909101207q73f0c513r60dd5ab83fdfd083@mail.gmail.com> Date: Thu, 10 Sep 2009 21:34:30 -0400 Message-ID: <237c27100909101834g49438707l96fa58df5f717945@mail.gmail.com> From: Linda Messerschmidt To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Sep 2009 01:34:32 -0000 Just to follow up, I've been doing some testing with masking for KTR_LOCK rather than KTR_SCHED. I'm having trouble with this because I have the KTR buffer size set to 1048576 entries, and with only KTR_LOCK enabled, this isn't enough for even a full second of tracing; the sample I'm working with now is just under 0.9s. It's an average of one entry every 2001 TSC ticks. That *seems* like a lot of locking activity, but some of the lock points are only a couple of lines apart, so maybe it's just incredibly verbose. Since it's so much data and I'm still working on a way to correlate it (lockgraph.py?), all I've got so far is a list of what trace points are coming up the most: 51927 src/sys/kern/kern_lock.c:215 (_lockmgr UNLOCK mtx_unlock() when flags & LK_INTERLOCK) 48033 src/sys/kern/vfs_subr.c:2284 (vdropl UNLOCK) 41548 src/sys/kern/vfs_subr.c:2187 (vput VI_LOCK) 29359 src/sys/kern/vfs_subr.c:2067 (vget VI_LOCK) 29358 src/sys/kern/vfs_subr.c:2079 (vget VI_UNLOCK) 23799 src/sys/nfsclient/nfs_subs.c:755 (nfs_getattrcache mtx_lock) 23460 src/sys/nfsclient/nfs_vnops.c:645 (nfs_getattr mtx_unlock) 23460 src/sys/nfsclient/nfs_vnops.c:642 (nfs_getattr mtx_lock) 23460 src/sys/nfsclient/nfs_subs.c:815 (nfs_getattrcache mtx_unlock) 23138 src/sys/kern/vfs_cache.c:345 (cache_lookup CACHE_LOCK) Unfortunately, it kind of sounds like I'm on my way to answering "why is this system slow?" even though it really isn't slow. (And I rush to point out that the Apache process in question doesn't at any point in its life touch NFS, though some of the other ones on the machine do.) In order to be the cause of my Apache problem, all this goobering around with NFS would have to be relatively infrequent but so intense that it shoves everything else out of the way. I'm skeptical, but I'm sure one of you guys can offer a more informed opinion. The only other thing I can think of is maybe all this is running me out of something I need (vnodes?) so everybody else blocks until it finishes and lets go of whatever finite resource it's using up? But that doesn't make a ton of sense either, because why would a lack of vnodes cause stalls in accept() or select() in unrelated processes? Not sure if I'm going in the right direction here or not.