Date: Wed, 11 Sep 2013 17:07:10 +0200 (CEST) From: Jimmy Olgeni <olgeni@olgeni.com> To: freebsd-stable@freebsd.org Subject: Possible kqueue related issue on STABLE/RC. Message-ID: <alpine.BSF.2.00.1309111705460.89324@olgeni.olgeni>
next in thread | raw e-mail | index | archive | help
Hello, Perhaps I found something weird while running 9.2-RC3 FreeBSD 9.2-RC3 #0 r255393 (ZFS-only setup). Quick history of the problem: - Lately, using a very recent -STABLE, the host would hang randomly while building ports with poudriere (-J2) and using X11, without producing a core dump (solid deadlock, apparently). It works perfectly when using the console only, and it can run a large build overnight without hanging. Being on X11 I could not find out what was happening on the console; desktop PC does not have a proper serial port so there's not much I can see. In any case it does not reboot automatically. - To rule out recent -STABLE changes I moved to 9.2-RC3 using SVN, but the system kept hanging on the same conditions. - I also enabled DDB to get a minidump, but still I could only get solid locks. - I downgraded the nvidia-driver port, just in case it has something to do with the crashes, but the crashes continued. - I downgraded to a known-safe -STABLE of July, then June, but the host would still crash. The very weird thing is that I have been always building stuff while using X11, and it never hanged. After downgrading both the OS and nvidia-driver I effectively got back a configuration that did not hang at the time, but the issue persisted. - However, this time I managed to get a minidump from the old -STABLE. I saved it here: http://olgeni.olgeni.com/~olgeni/core.txt.0 - After seeing the reference to kqueue, I remembered another thing that changed when the crashes started: gio-fam-backend went away, and glib20 uses kqueue (r324037). - I tried the same workload while using X11 with openbox only, and it worked fine. - Then, I came back to Gnome but made sure that anything related to gvfsd was periodically killed by a script, and the system returned to normal (i.e. flawless builds). - I remember that the gamin implementation uses to open and poll a lot of files, even files that were not used by the X11 environment or Nautilus specifically, and the gamin daemon could steal a good 5% of CPU for polling; restarting it brought it to 0%. - Not sure if it is related in any way, but running a standard "buildworld" does not crash the host. The only difference that I could think of is that poudriere uses jails. Unfortunately I'm not able to get a minidump for the latest RC, but at this point I suspect that something is going on with glib20 and kqueue on both -STABLE and -RC. If anybody has any idea I can test it easily, as it usually takes only a few minutes to hang everything. -- jimmy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1309111705460.89324>