From owner-freebsd-python@freebsd.org Tue Apr 27 18:41:24 2021 Return-Path: Delivered-To: freebsd-python@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 615945FB563 for ; Tue, 27 Apr 2021 18:41:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org [IPv6:2610:1c1:1:606c::50:13]) by mx1.freebsd.org (Postfix) with ESMTP id 4FV9Zr1zlzz4kRm for ; Tue, 27 Apr 2021 18:41:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id 43D915FB1F2; Tue, 27 Apr 2021 18:41:24 +0000 (UTC) Delivered-To: python@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 4381B5FB1F1 for ; Tue, 27 Apr 2021 18:41:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FV9Zr1HCnz4kwp for ; Tue, 27 Apr 2021 18:41:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 1B9BA1F9C4 for ; Tue, 27 Apr 2021 18:41:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 13RIfOZ6081390 for ; Tue, 27 Apr 2021 18:41:24 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 13RIfOSf081389 for python@FreeBSD.org; Tue, 27 Apr 2021 18:41:24 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" From: bugzilla-noreply@freebsd.org To: python@FreeBSD.org Subject: maintainer-feedback requested: [Bug 255445] lang/python 3.8/3.9 SIGSEV core dumps in libthr TrueNAS Date: Tue, 27 Apr 2021 18:41:24 +0000 X-Bugzilla-Type: request X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: crash X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: python@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? Message-ID: In-Reply-To: References: X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-python@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: FreeBSD-specific Python issues List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Apr 2021 18:41:24 -0000 Bugzilla Automation has asked freebsd-python (Nobody) for maintainer-feedback: Bug 255445: lang/python 3.8/3.9 SIGSEV core dumps in libthr TrueNAS https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D255445 --- Description --- Seeing many TrueNAS (previously FreeNAS) users dump core on the main middlewared process (python) starting with our version 12.0 release. Relevant OS information: 12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 f2858df162b(HEAD) TRUENAS amd64 Python versions that experience the core dump: Python 3.8.7 Python 3.9.4 When initially researching this, I did find a regression with threading and python 3.8 on freeBSD and was able to resolve that particular problem by backporting the commits: https://github.com/python/cpython/commit/4d96b4635aeff1b8ad41d41422ce808ce0= b971 c8 and https://github.com/python/cpython/commit/9ad58acbe8b90b4d0f2d2e139e38bb5aa3= 2b7f b6. The reason why I backported those commits is because all of the core dumps = that I've analyzed are panic'ing in the same spot (or very close to it). For example, here are 2 backtraces showing null-ptr dereference. Core was generated by `python3.8: middlewared'. Program terminated with signal SIGSEGV, Segmentation fault. #0 cond_signal_common (cond=3D) at /truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:457 warning: Source file is more recent than executable. 457 mp =3D td->mutex_obj; [Current thread is 1 (LWP 100733)] (gdb) list 452 _sleepq_unlock(cvp); 453 return (0); 454 } 455 456 td =3D _sleepq_first(sq); 457 mp =3D td->mutex_obj; 458 cvp->__has_user_waiters =3D _sleepq_remove(sq, td); 459 if (PMUTEX_OWNER_ID(mp) =3D=3D TID(curthread)) { 460 if (curthread->nwaiter_defer >=3D MAX_DEFER_WAITERS) { 461 _thr_wake_all(curthread->defer_waiters,=20 (gdb) p *td Cannot access memory at address 0x0 and another one Core was generated by `python3.8: middlewared'. Program terminated with signal SIGSEGV, Segmentation fault. #0 cond_signal_common (cond=3D) at /truenas-releng/freenas/_BE/os/lib/libthr/thread/thr_cond.c:459warning: Sou= rce file is more recent than executable. 459 if (PMUTEX_OWNER_ID(mp) =3D=3D TID(curthread)) { [Current thread is 1 (LWP 101105)] (gdb) list 454 } 455 456 td =3D _sleepq_first(sq); 457 mp =3D td->mutex_obj; 458 cvp->__has_user_waiters =3D _sleepq_remove(sq, td); 459 if (PMUTEX_OWNER_ID(mp) =3D=3D TID(curthread)) { 460 if (curthread->nwaiter_defer >=3D MAX_DEFER_WAITERS) { 461 _thr_wake_all(curthread->defer_waiters, 462 curthread->nwaiter_defer); 463 curthread->nwaiter_defer =3D 0; (gdb) p *mp Cannot access memory at address 0x0 I'm trying to instrument a program to "stress" test threading (tearing down= and recreating etc etc) but I've been unsuccessful at tickling this particular problem. The end-users that have seen this core dump sometimes go 1month + without a problem. Hoping someone more knowledgeable can at least give me a pointer or help me figure this one out. I have access to my VM that has all= the relevant core dumps available so if someone needs remote access to it to "p= oke" around, please let me know. You can reach me at caleb [at] ixsystems.com