Date: Mon, 24 Nov 2025 00:19:56 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 290958] ctfmerge: random Segmentation fault: 11 for `make buildkernel' on macOS Message-ID: <bug-290958-227-TG82JOfBqB@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-290958-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | previous in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=290958 Mark Peek <mp@FreeBSD.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs@FreeBSD.org |mp@FreeBSD.org CC| |mp@FreeBSD.org --- Comment #2 from Mark Peek <mp@FreeBSD.org> --- Created attachment 265610 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=265610&action=edit Patch for missing locking around ctfmerge fifo operations I was able to reproduce this issue when run in a loop and then simplified it by just running the cftmerge command in a loop from the last crash. This would fail fairly quickly in a loop to 100. (lldb) bt all thread #1 frame #0: 0x00000001978ca4f8 libsystem_kernel.dylib`__psynch_cvwait + 8 frame #1: 0x000000019790a0dc libsystem_pthread.dylib`_pthread_cond_wait + 984 frame #2: 0x0000000104eefca0 ctfmerge`main + 1736 frame #3: 0x0000000197541d54 dyld`start + 7184 thread #2 frame #0: 0x00000001978c99c8 libsystem_kernel.dylib`__psynch_mutexwait + 8 frame #1: 0x0000000197906e3c libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 84 frame #2: 0x0000000197904868 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 220 frame #3: 0x0000000104ef05dc ctfmerge`worker_thread + 980 frame #4: 0x0000000197909c08 libsystem_pthread.dylib`_pthread_start + 136 * thread #3, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x17f5) * frame #0: 0x0000000104ef093c ctfmerge`fifo_len + 16 frame #1: 0x0000000104ef06d4 ctfmerge`worker_thread + 1228 frame #2: 0x0000000197909c08 libsystem_pthread.dylib`_pthread_start + 136 thread #4 frame #0: 0x00000001978ca4f8 libsystem_kernel.dylib`__psynch_cvwait + 8 frame #1: 0x000000019790a0dc libsystem_pthread.dylib`_pthread_cond_wait + 984 frame #2: 0x0000000104ef06e8 ctfmerge`worker_thread + 1248 frame #3: 0x0000000197909c08 libsystem_pthread.dylib`_pthread_start + 136 Fixed the above occurrence by locking around the fifo_len() call and then received this at another location fifo_len() call: (lldb) bt all thread #1 frame #0: 0x00000001978ca4f8 libsystem_kernel.dylib`__psynch_cvwait + 8 frame #1: 0x000000019790a0dc libsystem_pthread.dylib`_pthread_cond_wait + 984 frame #2: 0x0000000102317ca0 ctfmerge`main(argc=<unavailable>, argv=<unavailable>) at ctfmerge.c:928:3 [opt] frame #3: 0x0000000197541d54 dyld`start + 7184 thread #2 frame #0: 0x00000001978c99c8 libsystem_kernel.dylib`__psynch_mutexwait + 8 frame #1: 0x0000000197906e3c libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 84 frame #2: 0x0000000197904868 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 220 frame #3: 0x000000019790a168 libsystem_pthread.dylib`_pthread_cond_wait + 1124 frame #4: 0x00000001023186f8 ctfmerge`worker_runphase2(wq=0x0000000102344968) at ctfmerge.c:472:4 [opt] [inlined] frame #5: 0x0000000102318624 ctfmerge`worker_thread(wq=0x0000000102344968) at ctfmerge.c:544:2 [opt] frame #6: 0x0000000197909c08 libsystem_pthread.dylib`_pthread_start + 136 thread #3 frame #0: 0x00000001978c99c8 libsystem_kernel.dylib`__psynch_mutexwait + 8 frame #1: 0x0000000197906e3c libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 84 frame #2: 0x0000000197904868 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 220 frame #3: 0x000000019790a168 libsystem_pthread.dylib`_pthread_cond_wait + 1124 frame #4: 0x00000001023186f8 ctfmerge`worker_runphase2(wq=0x0000000102344968) at ctfmerge.c:472:4 [opt] [inlined] frame #5: 0x0000000102318624 ctfmerge`worker_thread(wq=0x0000000102344968) at ctfmerge.c:544:2 [opt] frame #6: 0x0000000197909c08 libsystem_pthread.dylib`_pthread_start + 136 * thread #4, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x2176) * frame #0: 0x000000010231894c ctfmerge`fifo_len + 16 frame #1: 0x00000001023186e4 ctfmerge`worker_runphase2(wq=0x0000000102344968) at ctfmerge.c:471:7 [opt] [inlined] frame #2: 0x0000000102318624 ctfmerge`worker_thread(wq=0x0000000102344968) at ctfmerge.c:544:2 [opt] frame #3: 0x0000000197909c08 libsystem_pthread.dylib`_pthread_start + 136 thread #5 frame #0: 0x00000001978c99c8 libsystem_kernel.dylib`__psynch_mutexwait + 8 frame #1: 0x0000000197906e3c libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_wait + 84 frame #2: 0x0000000197904868 libsystem_pthread.dylib`_pthread_mutex_firstfit_lock_slow + 220 frame #3: 0x0000000102318578 ctfmerge`worker_thread(wq=0x0000000102344968) at ctfmerge.c:532:3 [opt] frame #4: 0x0000000197909c08 libsystem_pthread.dylib`_pthread_start + 136 thread #6 frame #0: 0x00000001978ca4f8 libsystem_kernel.dylib`__psynch_cvwait + 8 frame #1: 0x000000019790a0dc libsystem_pthread.dylib`_pthread_cond_wait + 984 frame #2: 0x00000001023186f8 ctfmerge`worker_runphase2(wq=0x0000000102344968) at ctfmerge.c:472:4 [opt] [inlined] frame #3: 0x0000000102318624 ctfmerge`worker_thread(wq=0x0000000102344968) at ctfmerge.c:544:2 [opt] frame #4: 0x0000000197909c08 libsystem_pthread.dylib`_pthread_start + 136 Fixed the second one and then found another by reviewing all the fifo_*() calls for the attached patch. I ran this twice in a loop to 10000 without an issue. Note to get a core dump on MacOS and lldb backtrace: 1. Change /cores to be writable by the user "chmod 777 /cores" 2. Set core limit "ulimit -c unlimited" 3. codesign the ctfmerge binary to give it a core dump entitlement: /usr/libexec/PlistBuddy -c "Add :com.apple.security.get-task-allow bool true" tmp.entitlements codesign -s - -f --entitlements tmp.entitlements /path/to/ctfmerge Then run lldb: lldb -c /cores/core.<pid> -f /path/to/ctfmerge (lldb) bt all -- You are receiving this mail because: You are the assignee for the bug.help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-290958-227-TG82JOfBqB>
