From owner-freebsd-current@FreeBSD.ORG Mon Apr 30 19:34:31 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC63A106566B; Mon, 30 Apr 2012 19:34:31 +0000 (UTC) (envelope-from jasone@freebsd.org) Received: from canonware.com (10140.x.rootbsd.net [204.109.63.53]) by mx1.freebsd.org (Postfix) with ESMTP id 994038FC08; Mon, 30 Apr 2012 19:34:31 +0000 (UTC) Received: from [172.25.16.174] (unknown [173.252.71.3]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by canonware.com (Postfix) with ESMTPSA id E059A28416; Mon, 30 Apr 2012 12:34:24 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=iso-8859-1 From: Jason Evans In-Reply-To: <4F9E9E06.4070004@entel.upc.edu> Date: Mon, 30 Apr 2012 12:34:32 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <2D080258-652B-4EFA-8F6F-6ECA3CA4404B@freebsd.org> References: <4F9E9E06.4070004@entel.upc.edu> To: =?iso-8859-1?Q?Gustau_P=E9rez_i_Querol?= X-Mailer: Apple Mail (2.1257) Cc: avilla@freebsd.org, FreeBSD current Subject: Re: RFC: jemalloc: qdbus sigsegv in malloc_init X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Apr 2012 19:34:31 -0000 On Apr 30, 2012, at 7:13 AM, Gustau P=E9rez i Querol wrote: > the kde team is seeing some strange problems with the new version = (4.8.1) of devel/dbus-qt4 with current. It does work with stable. I also = suspect that the problem described below is affecting the experimental = cinnamon port (an alternative to gnome3, possible replacement of = gnome2). >=20 > The problem happens with both i386 and amd64 with empty = /etc/malloc.conf and simple /etc/make.conf. Everything compiled with = base gcc (no clang). The kernel was compiled with no debug support, but = it can enable if needed. There are reports from avilla@freebsd.org of = the same behavior with clang compiled world and kernel and with = MALLOC_PRODUCTION=3Dyes. >=20 > When qdbus starts, it segfauts. The backtrace of the problem with = r234769 can be found here: http://pastebin.com/ryBXtqGF. When starting = the qdbus daemon by hand in a X+twm session, we see it calls calloc many = times and after a fixed number of times segfaults. We see it segfaults = at rb_gen (a quite large macro defined at = $SRC_BASE/contrib/jemalloc/include/jemalloc/internal/rb.h). >=20 > If the daemon is started by hand, I'm able to skip all the calls qdbus = makes to calloc till the one causing the segfault. At that point, at = rb_gen, we don't exactly know what is going on or how to debug the = macro. Ktrace are available, but we were unable to find anything new = from them. >=20 > With old versions of current before the jemalloc imports (as of March = 30th) the daemon segfaulted at malloc.c:2426. With revisions during = April 20 to 24th (can be more precise, it was during the jemalloc = imports) the daemon segfaulted at malloc_init. Bts are available if = needed, and if necessary I can go back to those revision and recompile = world+kernel to see its behavior. >=20 > Any help from freebsd-current@ (perhaps Jason Evans can help us) will = be appreciated. Any additional info, like source revisions, can be = provided. I would like to stress that the experimental devel/dbus-qt4 = works fine with recent stable. The crash is happening in page run management, so there is some pretty = bad memory corruption going on by the time of the crash. If I = understand you correctly, you have reproduced the crash on a system that = does *not* have MALLOC_PRODUCTION defined, which means that none of the = assertions in jemalloc caught the problem. Adrian Chadd made the excellent suggestion of trying valgrind; it's = likely to point out the problem almost immediately. If that doesn't = work, the utrace functionality in malloc may help you figure out what = activity has occurred by the time of the crash, and give you a better = understanding of what happened to memory around the address that is = involved in the crash. Jason=