From owner-freebsd-stable@FreeBSD.ORG Wed Sep 11 15:07:31 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id BC3F799F for ; Wed, 11 Sep 2013 15:07:31 +0000 (UTC) (envelope-from olgeni@olgeni.com) Received: from olgeni.olgeni.com (host-156-246-171-31.cloudsigma.com [31.171.246.156]) by mx1.freebsd.org (Postfix) with ESMTP id 6D711223C for ; Wed, 11 Sep 2013 15:07:30 +0000 (UTC) Received: from olgeni.olgeni (unknown [82.84.82.178]) by olgeni.olgeni.com (Postfix) with ESMTPSA id 95BB517449C for ; Wed, 11 Sep 2013 17:07:12 +0200 (CEST) Date: Wed, 11 Sep 2013 17:07:10 +0200 (CEST) From: Jimmy Olgeni X-X-Sender: olgeni@olgeni.olgeni To: freebsd-stable@freebsd.org Subject: Possible kqueue related issue on STABLE/RC. Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Sep 2013 15:07:31 -0000 Hello, Perhaps I found something weird while running 9.2-RC3 FreeBSD 9.2-RC3 #0 r255393 (ZFS-only setup). Quick history of the problem: - Lately, using a very recent -STABLE, the host would hang randomly while building ports with poudriere (-J2) and using X11, without producing a core dump (solid deadlock, apparently). It works perfectly when using the console only, and it can run a large build overnight without hanging. Being on X11 I could not find out what was happening on the console; desktop PC does not have a proper serial port so there's not much I can see. In any case it does not reboot automatically. - To rule out recent -STABLE changes I moved to 9.2-RC3 using SVN, but the system kept hanging on the same conditions. - I also enabled DDB to get a minidump, but still I could only get solid locks. - I downgraded the nvidia-driver port, just in case it has something to do with the crashes, but the crashes continued. - I downgraded to a known-safe -STABLE of July, then June, but the host would still crash. The very weird thing is that I have been always building stuff while using X11, and it never hanged. After downgrading both the OS and nvidia-driver I effectively got back a configuration that did not hang at the time, but the issue persisted. - However, this time I managed to get a minidump from the old -STABLE. I saved it here: http://olgeni.olgeni.com/~olgeni/core.txt.0 - After seeing the reference to kqueue, I remembered another thing that changed when the crashes started: gio-fam-backend went away, and glib20 uses kqueue (r324037). - I tried the same workload while using X11 with openbox only, and it worked fine. - Then, I came back to Gnome but made sure that anything related to gvfsd was periodically killed by a script, and the system returned to normal (i.e. flawless builds). - I remember that the gamin implementation uses to open and poll a lot of files, even files that were not used by the X11 environment or Nautilus specifically, and the gamin daemon could steal a good 5% of CPU for polling; restarting it brought it to 0%. - Not sure if it is related in any way, but running a standard "buildworld" does not crash the host. The only difference that I could think of is that poudriere uses jails. Unfortunately I'm not able to get a minidump for the latest RC, but at this point I suspect that something is going on with glib20 and kqueue on both -STABLE and -RC. If anybody has any idea I can test it easily, as it usually takes only a few minutes to hang everything. -- jimmy