From owner-freebsd-bugs@freebsd.org Mon Apr 22 13:46:03 2019 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B2018159A6F3 for ; Mon, 22 Apr 2019 13:46:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 448AC8B69E for ; Mon, 22 Apr 2019 13:46:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 09643159A6F1; Mon, 22 Apr 2019 13:46:03 +0000 (UTC) Delivered-To: bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9771159A6F0 for ; Mon, 22 Apr 2019 13:46:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 768EA8B69B for ; Mon, 22 Apr 2019 13:46:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id B6A7A1FDEB for ; Mon, 22 Apr 2019 13:46:01 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x3MDk1k5040193 for ; Mon, 22 Apr 2019 13:46:01 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x3MDk101040192 for bugs@FreeBSD.org; Mon, 22 Apr 2019 13:46:01 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 237470] [ppc][pseries] panic: Error evicting page: -7 Date: Mon, 22 Apr 2019 13:46:01 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: luporl@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Apr 2019 13:46:03 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D237470 Bug ID: 237470 Summary: [ppc][pseries] panic: Error evicting page: -7 Product: Base System Version: CURRENT Hardware: powerpc OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: luporl@FreeBSD.org I have seen this issue happening a couple of times. It is difficult to reproduce. In my case, it started happening more often when using a clang b= uilt with debug info, to perform large parallel builds. This is the panic message observed: panic: Error evicting page: -7 cpuid =3D 10 time =3D 1555531979 KDB: stack backtrace: 0xe000000033634910: at .kdb_backtrace+0x5c 0xe000000033634a40: at .vpanic+0x1b4 0xe000000033634b00: at .panic+0x38 0xe000000033634b90: at .mphyp_pte_insert+0x304 0xe000000033634cb0: at .moea64_pvo_enter+0x164 0xe000000033634d40: at .moea64_enter+0x520 0xe000000033634e40: at .moea64_enter_object+0xa8 0xe000000033634ef0: at .pmap_enter_object+0xa8 0xe000000033634fa0: at .vm_map_pmap_enter+0x2d0 0xe000000033635070: at .vm_map_insert+0x550 0xe000000033635170: at .vm_map_fixed+0x134 0xe000000033635240: at .vm_mmap_object+0x484 0xe000000033635350: at .vn_mmap+0x190 0xe000000033635430: at .kern_mmap+0x474 0xe000000033635550: at .sys_mmap+0x30 0xe0000000336355d0: at .trap+0x654 0xe000000033635770: at .powerpc_interrupt+0x290 0xe000000033635810: user SC trap by 0x81004e768: srr1=3D0x800000000000d032 r1=3D0x3fffffffffffb790 cr=3D0x22024024 xer=3D0 ctr=3D0x81004e9= 40 r2=3D0x810075d80 frame=3D0xe000000033635840 KDB: enter: panic This seems to indicate that the PTE to be evicted was not found. After some debugging, it seems to be that there is a race condition, between mphyp_pte_unset() and mphyp_pte_insert(), that may cause the page chosen for eviction to be removed by mphyp_pte_unset() before mphyp_pte_insert(). This can be explained as following: - mphyp_pte_insert() locks the pvo to be inserted - mphyp_pte_insert() obtains read access to mphyp_eviction_lock - mphyp_pte_insert() tries to insert the corresponding pte but fails - mphyp_pte_insert() releases mphyp_eviction_lock - mphyp_pte_insert() acquires mphyp_eviction_lock for write - mphyp_pte_insert() chooses a pte to evict - let's call it p - mphyp_pte_unset(), on another thread, locks the pvo that corresponds to p= te p - mphyp_pte_unset() removes p (without holding mphyp_eviction_lock) - mphyp_pte_insert() tries to remove p, but fails, because it was already removed by another thread - the system panics KDB's acttrace on this panic supports the hypothesis above: Tracing command clang-8 pid 44763 tid 100504 td 0xc00000000f220000 (CPU 10) 0xe000000033634a40: at .vpanic+0x1d4=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0xe000000033634b00: at .panic+0x38=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20 0xe000000033634b90: at .mphyp_pte_insert+0x304 0xe000000033634cb0: at .moea64_pvo_enter+0x164 0xe000000033634d40: at .moea64_enter+0x520=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 0xe000000033634e40: at .moea64_enter_object+0xa8 0xe000000033634ef0: at .pmap_enter_object+0xa8=20=20=20=20 0xe000000033634fa0: at .vm_map_pmap_enter+0x2d0 0xe000000033635070: at .vm_map_insert+0x550=20=20=20 0xe000000033635170: at .vm_map_fixed+0x134=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20 0xe000000033635240: at .vm_mmap_object+0x484=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20 0xe000000033635350: at .vn_mmap+0x190 0xe000000033635430: at .kern_mmap+0x474 0xe000000033635550: at .sys_mmap+0x30=20=20=20=20=20=20=20=20=20 0xe0000000336355d0: at .trap+0x654=20=20=20=20=20=20=20=20=20=20=20=20 0xe000000033635770: at .powerpc_interrupt+0x290 0xe000000033635810: user SC trap by 0x81004e768: srr1=3D0x800000000000d032 r1=3D0x3fffffffffffb790 cr=3D0x22024024 xer=3D0 ctr=3D0x81004e9= 40 r2=3D0x810075d80 frame=3D0xe0000000336358 40 Tracing command clang-8 pid 44640 tid 100690 td 0xc00000004913b000 (CPU 15) 0xe00000003404fad0: at .intr_event_handle+0xf0 0xe00000003404fb90: at .powerpc_dispatch_intr+0xf0 0xe00000003404fc40: at .xicp_dispatch+0x274 0xe00000003404fd00: at .powerpc_interrupt+0xc8 0xe00000003404fda0: kernel EXI trap by .lock_delay+0x5c: srr1=3D0x8000000000009032 r1=3D0xe000000034050050 cr=3D0x2000f084 xer=3D0 ctr=3D0x1b78 r2=3D0xc00000000155e890 frame=3D0xe00000003404fdd0 0xe000000034050050: at 0xc0000000fa530c1c 0xe0000000340500e0: at .__mtx_lock_sleep+0x238 0xe0000000340501d0: at .__mtx_lock_flags+0x160 0xe000000034050280: at .moea64_remove_pages+0x134 0xe000000034050340: at .pmap_remove_pages+0x78 0xe0000000340503d0: at .vmspace_exit+0xf8 0xe000000034050480: at .exit1+0x6d8 0xe000000034050550: at .sys_sys_exit+0x1c 0xe0000000340505d0: at .trap+0x654 0xe000000034050770: at .powerpc_interrupt+0x290 0xe000000034050810: user SC trap by 0x8133eafd8: srr1=3D0x800000000000f032 r1=3D0x3fffffffffffd890 cr=3D0x22000242 xer=3D0x20000000 ctr=3D= 0x8133eafd0 r2=3D0x8134bdbf0 frame=3D0xe000000034050840 Process 44640 is exiting and removing its pages while process 44763 is mapp= ing pages into memory. It seems to me this could be fix in 2 ways: 1- make mphyp_pte_unset() acquire the eviction lock before removing ptes 2- change mphyp_pte_insert() KASSERT, to tolerate H_NOT_FOUND, as mphyp_pte_unset() If correct, 2 is preferable, as avoids extra locking points. --=20 You are receiving this mail because: You are the assignee for the bug.=