From owner-freebsd-bugs@freebsd.org Sun Feb 19 22:20:10 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6D59BCE6F91 for ; Sun, 19 Feb 2017 22:20:10 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 539848F8 for ; Sun, 19 Feb 2017 22:20:10 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v1JMKAM0069250 for ; Sun, 19 Feb 2017 22:20:10 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 217239] head (e.g.:) -r313864 arm64 vs. jemalloc without MALLOC_PRODUCTION: various examples of tbin->avail being zero lead to SIGSEGV's Date: Sun, 19 Feb 2017 22:20:10 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: markmi@dsl-only.net X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Feb 2017 22:20:10 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D217239 Bug ID: 217239 Summary: head (e.g.:) -r313864 arm64 vs. jemalloc without MALLOC_PRODUCTION: various examples of tbin->avail being zero lead to SIGSEGV's Product: Base System Version: CURRENT Hardware: arm64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: bin Assignee: freebsd-bugs@FreeBSD.org Reporter: markmi@dsl-only.net Now that the fork trampoline for arm64 no longer allows interrupts to mess up the stack pointer things run longer and other issues show up. This report is for without MALLOC_PRODUCTION defined during buildworld. The kernel build is production style. [I've not tried the contrasting case of having MALLOC_PRODUCTION defined. I'll also note that I tried powerpc64 and had no problems for without MALLOC_PRODUCTION: this seems arm64 (aarch64) specific.] I've accumulated examples of each of the following getting SIGSEGV in jemalloc code and producing core files: script powerpd su (Note: I'm primarily building things from the console so the variety of activity is fairly limimted.) >From register values it appears that in each tbin->avail=3D=3D0 and calculations subtract a positive number from that. All the script examples look like (e.g.): (lldb) bt * thread #1: tid =3D 100143, 0x00000000404e9f08 libc.so.7`__je_tcache_dalloc_large(tsd=3D0x00000000405fd010, tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a25600, size=3D, slow_path=3D) + 228 at tcache.h:451, name =3D 'script', stop r= eason =3D signal SIGSEGV * frame #0: 0x00000000404e9f08 libc.so.7`__je_tcache_dalloc_large(tsd=3D0x00000000405fd010, tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a25600, size=3D, slow_path=3D) + 228 at tcache.h:451 frame #1: 0x0000000040510cfc libc.so.7`__free(ptr=3D0x0000000040a25600)= + 124 at jemalloc_jemalloc.c:2016 frame #2: 0x000000004058c5d8 libc.so.7`cleanfile(fp=3D0x00000000405e4cf= 0, c=3D) + 96 at fclose.c:62 frame #3: 0x000000004058c69c libc.so.7`fclose(fp=3D0x00000000405e4cf0) = + 60 at fclose.c:134 frame #4: 0x000000000040255c script`done(eno=3D0) + 268 at script.c:375 frame #5: 0x000000000040218c script`main [inlined] finish + 2772 at script.c:323 frame #6: 0x0000000000402154 script`main(argc=3D, argv=3D) + 2716 at script.c:299 frame #7: 0x0000000000401610 script`__start + 360 frame #8: 0x0000000040414658 ld-elf.so.1`.rtld_start + 24 at rtld_start.S:41 (lldb) down frame #0: 0x00000000404e9f08 libc.so.7`__je_tcache_dalloc_large(tsd=3D0x00000000405fd010, tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a25600, size=3D, slow_path=3D) + 228 at tcache.h:451 448 } 449 assert(tbin->ncached < tbin_info->ncached_max); 450 tbin->ncached++; -> 451 *(tbin->avail - tbin->ncached) =3D ptr; 452=20=20 453 tcache_event(tsd, tcache); 454 } They are from long running builds with lots of output in the typescript generated and the crash happens during the cleanup at the end. All the powerd examples look like (e.g.): (lldb) bt * thread #1: tid =3D 100099, 0x00000000404eaa10 libc.so.7`__je_tcache_dalloc_small(tsd=3D0x00000000405fe010, tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a1e000, binind=3D2, slow_path=3D) + 164 at tcache.h:421, name =3D 'powerd', stop r= eason =3D signal SIGSEGV * frame #0: 0x00000000404eaa10 libc.so.7`__je_tcache_dalloc_small(tsd=3D0x00000000405fe010, tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a1e000, binind=3D2, slow_path=3D) + 164 at tcache.h:421 frame #1: 0x0000000040511cfc libc.so.7`__free(ptr=3D0x0000000040a1e000)= + 124 at jemalloc_jemalloc.c:2016 frame #2: 0x000000000040201c powerd`main(argc=3D, argv=3D) + 3332 at powerd.c:786 frame #3: 0x0000000000401270 powerd`__start + 360 frame #4: 0x0000000040415658 ld-elf.so.1`.rtld_start + 24 at rtld_start.S:41 (lldb) down frame #0: 0x00000000404eaa10 libc.so.7`__je_tcache_dalloc_small(tsd=3D0x00000000405fe010, tcache=3D0x0000000040a0d000, ptr=3D0x0000000040a1e000, binind=3D2, slow_path=3D) + 164 at tcache.h:421 418 } 419 assert(tbin->ncached < tbin_info->ncached_max); 420 tbin->ncached++; -> 421 *(tbin->avail - tbin->ncached) =3D ptr; 422=20=20 423 tcache_event(tsd, tcache); 424 } So every similar to the script failures: these are during the cleanup at the end (but dalloc large vs. small). All the su examples look like (e.g.): (lldb) bt * thread #1: tid =3D 100156, 0x000000004054b1dc libc.so.7`__je_arena_tcache_fill_small(tsdn=3D, arena=3D, tbin=3D, binind=3D, prof_accumbytes=3D) + 212 at jemalloc_arena.c:2442, name =3D 'su', stop reason =3D signal SIGSEGV * frame #0: 0x000000004054b1dc libc.so.7`__je_arena_tcache_fill_small(tsdn=3D, arena=3D, tbin=3D, binind=3D, prof_accumbytes=3D) + 212 at jemalloc_arena.c:2442 frame #1: 0x000000004052e5a0 libc.so.7`__je_tcache_alloc_small [inlined] __je_tcache_alloc_small_hard(tsdn=3D, arena=3D0x00000000408001= 40, tbin=3D0x0000000040a0d0a8, binind=3D4) + 20 at jemalloc_tcache.c:79 frame #2: 0x000000004052e58c libc.so.7`__je_tcache_alloc_small(tsd=3D0x0000000040647010, arena=3D0x0000000040800140, tcache=3D0x0000000040a0d000, size=3D, binind=3D4, zero=3Dfalse, slow_path=3Dtrue) + 332 at tcache.h:298 frame #3: 0x0000000040555184 libc.so.7`__malloc(size=3D1) + 184 at jemalloc_jemalloc.c:1645 frame #4: 0x000000004046979c libpam.so.6`openpam_vasprintf(str=3D0x0000ffffffffe520, fmt=3D"", ap=3D) + 92 at openpam_vasprintf.c:53 frame #5: 0x0000000040469714 libpam.so.6`openpam_asprintf(str=3D, fmt=3D) + 12= 0 at openpam_asprintf.c:52 frame #6: 0x000000004046960c libpam.so.6`_openpam_log(level=3D, func=3D"", fmt=3D"") + 224 at openpam_log.c:125 frame #7: 0x0000000040466914 libpam.so.6`openpam_dispatch(pamh=3D, primitive=3D, flags=3D) + 1256 at openpam_dispatch.c:182 frame #8: 0x0000000040463b54 libpam.so.6`pam_setcred(pamh=3D0x0000000040a44000, flags=3D2) + 112 at pam_setcred.c:66 frame #9: 0x0000000040b77730 su`main(argc=3D, argv=3D) + 2280 at su.c:475 frame #10: 0x0000000040b76da0 su`__start + 360 frame #11: 0x0000000040415658 ld-elf.so.1`.rtld_start + 24 at rtld_start.S:41 (lldb) down frame #0: 0x000000004054b1dc libc.so.7`__je_arena_tcache_fill_small(tsdn=3D, arena=3D, tbin=3D, binind=3D, prof_accumbytes=3D) + 212 at jemalloc_arena.c:2442 2439 true); 2440 } 2441 /* Insert such that low regions get used first. */ -> 2442 *(tbin->avail - nfill + i) =3D ptr; 2443 } 2444 if (config_stats) { 2445 bin->stats.nmalloc +=3D i; So not as close but also during cleanup (of the parent process of the fork) during PAM_END() before exit. See also bugzilla 217138. --=20 You are receiving this mail because: You are the assignee for the bug.=