Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 05 Oct 2008 17:34:22 -0700
From:      Tim Kientzle <kientzle@freebsd.org>
To:        jos@catnook.com
Cc:        Andrey Chernov <ache@nagual.pp.ru>, freebsd-current@freebsd.org
Subject:   Re: firefox3-bin crashes near arc4random_buf()
Message-ID:  <48E95D0E.50202@freebsd.org>
In-Reply-To: <20081005233256.GB8507@lizzy.catnook.local>
References:  <20081004080511.GA72641@lizzy.catnook.local> <20081004161024.GA67323@nagual.pp.ru> <20081004222249.GA48928@lizzy.catnook.local> <48E80F02.4070309@freebsd.org> <20081005233256.GB8507@lizzy.catnook.local>

next in thread | previous in thread | raw e-mail | index | archive | help
> I watched it crash a bunch more times and the backtraces are the same. That's
> good, right? :-)

Yes.  For a suitable definition of "good."  ;-)

>>It might also be worth running it under ktrace,
>>forcing the crash, then sharing the last few dozen
>>lines from kdump output.
> 
> Also attached is firefox3.kdump. The last few lines look like:
> 
>   6855 firefox-bin RET   clock_gettime 0
>   6855 firefox-bin CALL  _umtx_op(0x8179760,0x8,0x1,0x8179740,0xbf8fdddc)
>   6855 firefox-bin PSIG  SIGSEGV caught handler=0x28237290 mask=0x0 code=0x1
>   6855 firefox-bin CALL  unlink(0x8179600)
>   6855 firefox-bin NAMI  "/home/jos/.mozilla/firefox/tosfxhak.default/lock"
>   6855 firefox-bin RET   unlink 0
>   6855 firefox-bin CALL  sigaction(SIGSEGV,0x2978dfb4,0)
>   6855 firefox-bin RET   sigaction 0
>   6855 firefox-bin CALL  sigprocmask(SIG_UNBLOCK,0xbf4f906c,0)
>   6855 firefox-bin RET   sigprocmask 0
>   6855 firefox-bin CALL  thr_kill(0x1878c,SIGSEGV)
>   6855 firefox-bin RET   thr_kill 0
>   6855 firefox-bin PSIG  SIGSEGV SIG_DFL
> 
> This to me suggests that the segfault happens inside _umtx_op. Am I reading
> that correctly?

Not necessarily.  Firefox is multi-threaded.  The thread that
called _umtx_op() is not the thread that crashed (_umtx_op()
hadn't returned to userspace, so that thread was still in
the kernel).

This does, however, answer one puzzle:  Firefox appears to
have a signal handler that catches SEGV, releases the lock
file, then re-throws SEGV to actually kill the program.
That explains stack frames #0-#4 in your backtrace; that's
the signal handler executing after the segfault but before
the program is terminated.

Something is still screwy about the backtrace.  dbopen()
doesn't call arc4random_buf.  However, it does call
mkstemp() which does call arc4random_uniform, which should
be right next to arc4random_buf in memory.  GCC optimizations
could be obscuring the call stack here.

It's certainly possible that arc4random is involved
somehow but I don't yet see it.  It does seem likely
that we're looking at a libc problem, so a debug
version of libc might help.  Replacing libc on a
running system is a little tricky.  I believe the
following works, though I've not tried it:

% cd /usr/src/lib/libc
% make clean
% make DEBUG_FLAGS=-g
% cp /lib/libc.so.7 /lib/libc.so.7-backup
... reboot to single user, use /rescue/sh as your shell ...
% cp /usr/src/lib/libc/libc.so.7 /lib/libc.so.7
... reboot ...

This should give you a standard libc with full
debugging symbols.  Hopefully, the backtrace will
now give more details.

I think we're getting closer.

Tim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48E95D0E.50202>