From owner-freebsd-threads@FreeBSD.ORG Wed Oct 6 18:13:58 2010 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 164D8106566B for ; Wed, 6 Oct 2010 18:13:58 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 8D0C38FC1D for ; Wed, 6 Oct 2010 18:13:57 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 0536B3F5B6 for ; Wed, 6 Oct 2010 17:56:09 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.4/8.14.4) with ESMTP id o96Hu9KK051323 for ; Wed, 6 Oct 2010 17:56:09 GMT (envelope-from phk@critter.freebsd.dk) To: threads@freebsd.org From: Poul-Henning Kamp Date: Wed, 06 Oct 2010 17:56:09 +0000 Message-ID: <51322.1286387769@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: Subject: suspect problems on -current with pthread_cond_*() X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Oct 2010 18:13:58 -0000 Hi Guys, I updated my machine to current (9.0-CURRENT #0 r213377M: Mon Oct 4) (previous version from april sometime) and have started to see weird new problems with Varnish regression tests. It's pretty hard to get a trace on the problem, but from what I have found out until now, it is related to the very first operation(s) on a pthread_cond_t and the typical indication is a 100% cpu-spin inside libthr. I can reproduce the problem in approx 5 minutes by running the automated Varnish regression tests in >=8 parallel streams repeatedly[1] but due to the nature/complexity of varnish, I have not been able to get a debugger to give me a useful backtrace yet. I only use pthread_cond_t's in two isolated places and I am going to muck about with them now, to see if I can affect the issue in any way (higher/lower failure rate etc). Any insights ? Poul-Henning PS: I'll arrive in Karlsruhe friday morning... [1] It is an easy test to set up: svn co http://www.varnish-cache.org/svn/trunk cd trunk/varnish-cache sh autogen.des make cd varnish-cache/bin/varnishtest while gmake -j 12 -f Makefile.kristian check do true done Look for test-failures with "HTTP rx failed (poll: Unknown error: 0)" A couple of the test cases may fail under high load for other reasons, in particular m00001.vtc and c00002.vtc. The varnishtest driver program can also be hit, but this happens much more seldom, that usually leaves a core dump with a useless backtrace. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.