From owner-freebsd-hackers@FreeBSD.ORG Sat Sep 12 03:55:36 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 530541065670; Sat, 12 Sep 2009 03:55:36 +0000 (UTC) (envelope-from linda.messerschmidt@gmail.com) Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.27]) by mx1.freebsd.org (Postfix) with ESMTP id EC23E8FC14; Sat, 12 Sep 2009 03:55:35 +0000 (UTC) Received: by qw-out-2122.google.com with SMTP id 3so522095qwe.7 for ; Fri, 11 Sep 2009 20:55:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=G01ABb/Ddx21tXgaeEnog8YW96nc8mzkPuBBMY514z8=; b=CRWDThaOrl+AKs0iLZiQ7daVKoGaSuUg2plrUrFgmR4RHQk08z3zib8jGur13nou8R osmlV7fiPd2XCei+vX6DXjfSu5Y8Uwyjsmis09f0NHDb3Mzn5+l8vG6W1hox3dA3EjH9 iVn5RHJUM4xsBBtzI+mTBEaY6IB1T7raBCPTU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=bQdWyuiZJAhuOx1PxoSrOvGCnpK1LJp3bQGhBG/sih+wv8t4XLxLmi+DiRGKBcNTnM el5RlrB1l0e8sM8/YLDzkUDuCh0RhYPAqo6r+xdhYQEMv1qYgIjqNsWC2qki0hvAIgHv OM31pptO9iACY+oEAI4ZDvSwslMi5FZQ8B84c= MIME-Version: 1.0 Received: by 10.229.29.85 with SMTP id p21mr1488496qcc.101.1252727735381; Fri, 11 Sep 2009 20:55:35 -0700 (PDT) In-Reply-To: <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> References: <237c27100908261203g7e771400o2d9603220d1f1e0b@mail.gmail.com> <200909111102.14503.jhb@freebsd.org> <237c27100909111035y544e8c91hc7726fd6ef16e351@mail.gmail.com> <200909111506.47309.jhb@freebsd.org> <237c27100909111905y244924c1n93b4e4d9ceda44be@mail.gmail.com> Date: Fri, 11 Sep 2009 23:55:35 -0400 Message-ID: <237c27100909112055i35612b4btbfbecb8b5dd1568c@mail.gmail.com> From: Linda Messerschmidt To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-hackers@freebsd.org Subject: Re: Intermittent system hangs on 7.2-RELEASE-p1 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Sep 2009 03:55:36 -0000 OK, I have learned that ktrdump looks up the name of the process associated with a particular KSE at the the time of the dump, so if it's changed since tracing stopped, it will blissfully blame the wrong process. I understand why that's the case, but it still sucks for troubleshooting. :( This time, "pf task mtx" and "vnode_free_list" are the locks getting the blame. The processes fingered are an httpd ( (the root "parent" of the one doing the work, which does nothing but select() for 1s and wait to see if its children died), and vnlru. No correlation at all to the previous results, and this machine is now utterly quiescent except for the httpd process and the PHP exerciser. Hard to imagine vnlru has 1s worth of running to do on a machine with 949 total vnodes in use. A third run produced a 997ms "lock acquire" for "buffer daemon lock," a 497ms one for ip6qlock (no, there's no IPv6 in use on this machine), and an 8s (!!!) one on unp_mtx. bufdaemon had a 997s "running" bar, but according to the raw TSC values, that happened on the same CPU 1.999s *after* the 997ms buffer daemon lock acquire. I really don't know where to go from here. There's so little consistency that I'm just not sure if the data is bad, the tool is bad, the operator is bad, or there's some problem so fundamentally horrible that all I'm seeing is random side effects.