From owner-freebsd-bugs@freebsd.org Wed Jun 26 05:55:36 2019 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B400315B4A15 for ; Wed, 26 Jun 2019 05:55:36 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 4E6E18EC57 for ; Wed, 26 Jun 2019 05:55:36 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 0809B15B4A14; Wed, 26 Jun 2019 05:55:36 +0000 (UTC) Delivered-To: bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D972E15B4A13 for ; Wed, 26 Jun 2019 05:55:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 64E0C8EC51 for ; Wed, 26 Jun 2019 05:55:35 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 834F215FF6 for ; Wed, 26 Jun 2019 05:55:34 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x5Q5tY6A088798 for ; Wed, 26 Jun 2019 05:55:34 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x5Q5tYD0088795 for bugs@FreeBSD.org; Wed, 26 Jun 2019 05:55:34 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 238817] g_raid3_access race on destruction Date: Wed, 26 Jun 2019 05:55:34 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: rlibby@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter cc Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Jun 2019 05:55:36 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D238817 Bug ID: 238817 Summary: g_raid3_access race on destruction Product: Base System Version: CURRENT Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: rlibby@freebsd.org CC: markj@FreeBSD.org It seems like the g_raid3_softc can be destroyed in between g_topology_unlock() and sx_xlock(&sc->sc_lock) in g_raid3_access(). There may need to be some kind of flag on the softc protected by the topology lock to indicate an in-progress topology to softc relocking, to prevent destruction in the meantime (perhaps by checking for it in g_raid3_can_destroy()). I am not sure if this pattern may affect other sites where this is done too, like g_raid3_destroy_geom(). I am also not sure if it may affect other geom classes, like gmirror, or if they might not suffer from this problem due to different conditions under which their softcs are destroyed. With the same set up as in bug 238814: sysctl kern.geom.raid3.debug=3D4 sysctl debug.fail_point.mnowait=3D1%return while true; do kyua test -k /usr/tests/sys/geom/class/raid3/Kyuafile; done [...] GEOM_RAID3[0]: Request failed (error=3D28). md1[WRITE(offset=3D1024, length= =3D1024)] GEOM_RAID3[4]: g_raid3_event_send: Sending event 0xfffff800163e2a80. GEOM_RAID3[4]: g_raid3_event_send: Waking up 0xfffff800048bd400. GEOM_RAID3[0]: Request failed (error=3D28). md2[WRITE(offset=3D1024, length= =3D1024)] GEOM_RAID3[4]: g_raid3_event_send: Sending event 0xfffff800163e2ac0. GEOM_RAID3[4]: g_raid3_event_send: Waking up 0xfffff800048bd400. GEOM_RAID3[0]: Request failed. raid3/graid3.rlWY7w[WRITE(offset=3D2048, length=3D2048)] GEOM_RAID3[3]: Running event for disk md1. GEOM_RAID3[3]: Changing disk md1 state from ACTIVE to DISCONNECTED. GEOM_RAID3[1]: Disk md1 state changed from ACTIVE to DISCONNECTED (device graid3.rlWY7w). GEOM_RAID3[0]: Device graid3.rlWY7w: provider md1 disconnected. GEOM_RAID3[2]: Access request for raid3/graid3.rlWY7w: r-1w-1e0. GEOM_RAID3[2]: Consumer md1 destroyed. GEOM_RAID3[2]: Access md1 r-1w-1e-1 =3D 0 GEOM_RAID3[1]: Device graid3.rlWY7w: genid bumped to 1. GEOM_RAID3[2]: Metadata on md0 updated. GEOM_RAID3[2]: Metadata on md2 updated. GEOM_RAID3[1]: Device graid3.rlWY7w state changed from COMPLETE to DEGRADED. GEOM_RAID3[3]: Running event for disk md2. GEOM_RAID3[3]: Changing disk md2 state from ACTIVE to DISCONNECTED. GEOM_RAID3[1]: Disk md2 state changed from ACTIVE to DISCONNECTED (device graid3.rlWY7w). GEOM_RAID3[0]: Device graid3.rlWY7w: provider md2 disconnected. GEOM_RAID3[1]: Consumer md1 destroyed. GEOM_RAID3[2]: Consumer md2 destroyed. GEOM_RAID3[2]: Access md2 r-1w-1e-1 =3D 0 GEOM_RAID3[0]: Device graid3.rlWY7w: provider raid3/graid3.rlWY7w destroyed. GEOM_RAID3[2]: No I/O requests for graid3.rlWY7w, it can be destroyed. GEOM_RAID3[2]: Metadata on md0 updated. GEOM_RAID3[2]: Consumer md0 destroyed. GEOM_RAID3[2]: Access md0 r-1w-1e-1 =3D 0 GEOM_RAID3[0]: Device graid3.rlWY7w destroyed. GEOM_RAID3[1]: Thread exiting. Fatal trap 9: general protection fault while in kernel mode cpuid =3D 2; apic id =3D 02 instruction pointer =3D 0x20:0xffffffff80ba77b4 stack pointer =3D 0x28:0xfffffe00512813b0 frame pointer =3D 0x28:0xfffffe0051281450 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 1137 (dd) trap number =3D 9 panic: general protection fault cpuid =3D 2 time =3D 1561520628 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0051281= 0c0 vpanic() at vpanic+0x19d/frame 0xfffffe0051281110 panic() at panic+0x43/frame 0xfffffe0051281170 trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00512811d0 trap() at trap+0x6c/frame 0xfffffe00512812e0 calltrap() at calltrap+0x8/frame 0xfffffe00512812e0 --- trap 0x9, rip =3D 0xffffffff80ba77b4, rsp =3D 0xfffffe00512813b0, rbp = =3D 0xfffffe0051281450 --- _sx_xlock_hard() at _sx_xlock_hard+0x274/frame 0xfffffe0051281450 _sx_xlock() at _sx_xlock+0xc1/frame 0xfffffe0051281490 g_raid3_access() at g_raid3_access+0x11c/frame 0xfffffe00512814e0 g_access() at g_access+0x28e/frame 0xfffffe0051281550 g_dev_close() at g_dev_close+0x158/frame 0xfffffe00512815a0 devfs_close() at devfs_close+0x2e4/frame 0xfffffe0051281610 VOP_CLOSE_APV() at VOP_CLOSE_APV+0x60/frame 0xfffffe0051281630 vn_close1() at vn_close1+0xe3/frame 0xfffffe00512816a0 vn_closefile() at vn_closefile+0x4c/frame 0xfffffe0051281720 devfs_close_f() at devfs_close_f+0x2c/frame 0xfffffe0051281750 _fdrop() at _fdrop+0x1a/frame 0xfffffe0051281770 closef() at closef+0x1ec/frame 0xfffffe0051281800 fdescfree_fds() at fdescfree_fds+0x8c/frame 0xfffffe0051281850 fdescfree() at fdescfree+0x37a/frame 0xfffffe0051281910 exit1() at exit1+0x4fe/frame 0xfffffe0051281980 sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe0051281990 amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0051281ab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0051281ab0 --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip =3D 0x8003c892a, rsp =3D 0x7fffffffd908, rbp =3D 0x7fffffffd920 --- KDB: enter: panic [ thread pid 1137 tid 100201 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why db> x/x ticks ticks: 7fff7475 db> x/x g_udnf_last_ticks g_udnf_last_ticks: 7fff7472 db> x/s g_udnf_last_name g_udnf_last_name: md2 db> x/d g_udnf_last_tid g_udnf_last_tid: 100193 db> x/aS g_udnf_last_stack+0x8,0x12 g_udnf_last_stack+0x8: uma_dbg_nowait_fail_record+0x31 g_udnf_last_stack+0x10: zalloc_inject_failure+0x4c g_udnf_last_stack+0x18: uma_zalloc_arg+0xa98 g_udnf_last_stack+0x20: mdstart_malloc+0x81d g_udnf_last_stack+0x28: md_kthread+0x20c g_udnf_last_stack+0x30: fork_exit+0x84 g_udnf_last_stack+0x38: fork_trampoline+0xe g_udnf_last_stack+0x40: 0 g_udnf_last_stack+0x48: 0 g_udnf_last_stack+0x50: 0 g_udnf_last_stack+0x58: 0 g_udnf_last_stack+0x60: 0 g_udnf_last_stack+0x68: 0 g_udnf_last_stack+0x70: 0 g_udnf_last_stack+0x78: 0 g_udnf_last_stack+0x80: 0 g_udnf_last_stack+0x88: 0 g_udnf_last_stack+0x90: 0 db> x/s version version: FreeBSD 13.0-CURRENT #42 r349025+3bdd0fc24f5b(mnowait-dbg)-dirty: Tue Jun 25 20:34:27 PDT 2019\012= =20=20=20 root@vali.kishkinda.net:/usr/obj/usr/src/freebsd/amd64.amd64/sys/GENERIC\012 --=20 You are receiving this mail because: You are the assignee for the bug.=