Date: Sun, 05 Oct 2008 17:34:22 -0700 From: Tim Kientzle <kientzle@freebsd.org> To: jos@catnook.com Cc: Andrey Chernov <ache@nagual.pp.ru>, freebsd-current@freebsd.org Subject: Re: firefox3-bin crashes near arc4random_buf() Message-ID: <48E95D0E.50202@freebsd.org> In-Reply-To: <20081005233256.GB8507@lizzy.catnook.local> References: <20081004080511.GA72641@lizzy.catnook.local> <20081004161024.GA67323@nagual.pp.ru> <20081004222249.GA48928@lizzy.catnook.local> <48E80F02.4070309@freebsd.org> <20081005233256.GB8507@lizzy.catnook.local>
next in thread | previous in thread | raw e-mail | index | archive | help
> I watched it crash a bunch more times and the backtraces are the same. That's > good, right? :-) Yes. For a suitable definition of "good." ;-) >>It might also be worth running it under ktrace, >>forcing the crash, then sharing the last few dozen >>lines from kdump output. > > Also attached is firefox3.kdump. The last few lines look like: > > 6855 firefox-bin RET clock_gettime 0 > 6855 firefox-bin CALL _umtx_op(0x8179760,0x8,0x1,0x8179740,0xbf8fdddc) > 6855 firefox-bin PSIG SIGSEGV caught handler=0x28237290 mask=0x0 code=0x1 > 6855 firefox-bin CALL unlink(0x8179600) > 6855 firefox-bin NAMI "/home/jos/.mozilla/firefox/tosfxhak.default/lock" > 6855 firefox-bin RET unlink 0 > 6855 firefox-bin CALL sigaction(SIGSEGV,0x2978dfb4,0) > 6855 firefox-bin RET sigaction 0 > 6855 firefox-bin CALL sigprocmask(SIG_UNBLOCK,0xbf4f906c,0) > 6855 firefox-bin RET sigprocmask 0 > 6855 firefox-bin CALL thr_kill(0x1878c,SIGSEGV) > 6855 firefox-bin RET thr_kill 0 > 6855 firefox-bin PSIG SIGSEGV SIG_DFL > > This to me suggests that the segfault happens inside _umtx_op. Am I reading > that correctly? Not necessarily. Firefox is multi-threaded. The thread that called _umtx_op() is not the thread that crashed (_umtx_op() hadn't returned to userspace, so that thread was still in the kernel). This does, however, answer one puzzle: Firefox appears to have a signal handler that catches SEGV, releases the lock file, then re-throws SEGV to actually kill the program. That explains stack frames #0-#4 in your backtrace; that's the signal handler executing after the segfault but before the program is terminated. Something is still screwy about the backtrace. dbopen() doesn't call arc4random_buf. However, it does call mkstemp() which does call arc4random_uniform, which should be right next to arc4random_buf in memory. GCC optimizations could be obscuring the call stack here. It's certainly possible that arc4random is involved somehow but I don't yet see it. It does seem likely that we're looking at a libc problem, so a debug version of libc might help. Replacing libc on a running system is a little tricky. I believe the following works, though I've not tried it: % cd /usr/src/lib/libc % make clean % make DEBUG_FLAGS=-g % cp /lib/libc.so.7 /lib/libc.so.7-backup ... reboot to single user, use /rescue/sh as your shell ... % cp /usr/src/lib/libc/libc.so.7 /lib/libc.so.7 ... reboot ... This should give you a standard libc with full debugging symbols. Hopefully, the backtrace will now give more details. I think we're getting closer. Tim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48E95D0E.50202>