Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Jul 2011 20:11:02 +0200
From:      Marius Strobl <marius@alchemy.franken.de>
To:        KOT MATPOCKuH <matpockuh@gmail.com>
Cc:        Doug Barton <dougb@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: named crashes on assertion in rbtdb.c on sparc64/SMP
Message-ID:  <20110708181102.GA95673@alchemy.franken.de>
In-Reply-To: <CALmdT0V_MG7abrGyp-JodsP3Bun-C863VGqTkSAdewnFbiA-%2Bg@mail.gmail.com>
References:  <BANLkTinMimzfdK0Y4otZegGvztwo-yA4EA@mail.gmail.com> <BANLkTi=_PP=tvLUbXiTE1DT4tMirLRUY%2Bw@mail.gmail.com> <20110629134140.GF14797@alchemy.franken.de> <4E0B8F25.7090107@FreeBSD.org> <CALmdT0V2Fzf8mcYRzYzsu5XkauuxGaF4dxFPR7ZHr-FH2a_5bQ@mail.gmail.com> <20110707100446.GJ14797@alchemy.franken.de> <CALmdT0VFC7kBxaEqLuFVWkLk3o2hLe29tsx3dgn17tuTNaTRLA@mail.gmail.com> <20110707154958.GK14797@alchemy.franken.de> <CALmdT0V_MG7abrGyp-JodsP3Bun-C863VGqTkSAdewnFbiA-%2Bg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 08, 2011 at 03:47:08PM +0400, KOT MATPOCKuH wrote:
> 2011/7/7 Marius Strobl <marius@alchemy.franken.de>:
> > That's not the patch I was referring to. I did a second one which just
> > entirely disables the use of atomic operations on sparc64:
> > http://people.freebsd.org/~marius/sparc64_isc_disable_atomic.diff
> Omg. I'm sorry.
> I applied this patch and restarted named, but named crashed immediatly
> after start:
> 08-Jul-2011 15:29:54.631 found 2 CPUs, using 2 worker threads
> 08-Jul-2011 15:29:54.633 using up to 4096 sockets
> Segmentation fault (core dumped)
> 
> core's backtrace:
> #0  0x0000000040953ba8 in __sparc_utrap_install () from /lib/libc.so.7
> (gdb) bt
> #0  0x0000000040953ba8 in __sparc_utrap_install () from /lib/libc.so.7
> #1  0x0000000040953ccc in __sparc_utrap_install () from /lib/libc.so.7
> #2  0x0000000040953f70 in __sparc_utrap_install () from /lib/libc.so.7
> #3  0x00000000409537ac in __sparc_utrap_install () from /lib/libc.so.7
> #4  0x00000000407c2d54 in pthread_mutex_lock () from /lib/libthr.so.3
> #5  0x0000000000228dcc in ?? ()
> Previous frame identical to this frame (corrupt stack?)
> 
> Could this be a sign to a problem in libthr?

Could be but IMO that's unlikely, if there'd be a bug affecting
pthread_mutex_lock() there should be more fallout from that. I'm probably
missing something how to properly disable the use of the ISC atomic
implementation and to enable the alternative locking.
Please try the following:
a) Instead of the base BIND use the dns/bind96 port. The native build
   of the latter defaults to not using the ISC atomic implementation
   on sparc64 (and arm) and should properly enable the alternative. I
   can at least start named from bind96-9.6.3.1.ESV.R4.3 with the default
   configuration on -CURRENT without problems.
b) Revert the above patch and try the base bind with the following
   (third) patch:
   http://people.freebsd.org/~marius/sparc64_isc_atomic.h.diff2
   That one adds the memory barriers required for reference counting
   albeit in a sledgehammer-like fashion as the ISC atomic API doesn't
   allow to distinguish between acquire and release semantics.

> 
> PS.
> Also one month ago I got a problems with another multithreaded
> application from ports (www/oops). oops was crashed with stack's
> backtrace:
> #0  0x0000000040d8fc88 in __sparc_utrap_install () from /lib/libc.so.7
> #1  0x0000000040d8fdac in __sparc_utrap_install () from /lib/libc.so.7
> #2  0x0000000040d90050 in __sparc_utrap_install () from /lib/libc.so.7
> #3  0x0000000040d8f88c in __sparc_utrap_install () from /lib/libc.so.7
> #4  0x0000000040d64044 in _malloc_thread_cleanup () from /lib/libc.so.7
> #5  0x0000000040c039b8 in fork () from /lib/libthr.so.3
> #6  0x0000000040c03d38 in fork () from /lib/libthr.so.3
> #7  0x0000000040c03f50 in pthread_exit () from /lib/libthr.so.3
> #8  0x0000000040c04414 in pthread_detach () from /lib/libthr.so.3
> #9  0x0000000040c04710 in pthread_create () from /lib/libthr.so.3
> 
> But on yesterday's world's build oops works properly. I think it may
> be related to r223228 (?)

Unlikely, the crash caused by the assertion in _malloc_thread_cleanup()
was solved with r223369.

Marius




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110708181102.GA95673>