FreeBSD Mail Archives

Date:      Wed, 26 Jun 2019 05:55:34 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 238817] g_raid3_access race on destruction
Message-ID:  <bug-238817-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D238817

            Bug ID: 238817
           Summary: g_raid3_access race on destruction
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: rlibby@freebsd.org
                CC: markj@FreeBSD.org

It seems like the g_raid3_softc can be destroyed in between
g_topology_unlock() and sx_xlock(&sc->sc_lock) in g_raid3_access().

There may need to be some kind of flag on the softc protected by the
topology lock to indicate an in-progress topology to softc relocking, to
prevent destruction in the meantime (perhaps by checking for it in
g_raid3_can_destroy()).  I am not sure if this pattern may affect other
sites where this is done too, like g_raid3_destroy_geom().  I am also
not sure if it may affect other geom classes, like gmirror, or if they
might not suffer from this problem due to different conditions under
which their softcs are destroyed.

With the same set up as in bug 238814:

sysctl kern.geom.raid3.debug=3D4
sysctl debug.fail_point.mnowait=3D1%return
while true; do kyua test -k /usr/tests/sys/geom/class/raid3/Kyuafile; done
[...]
GEOM_RAID3[0]: Request failed (error=3D28). md1[WRITE(offset=3D1024, length=
=3D1024)]
GEOM_RAID3[4]: g_raid3_event_send: Sending event 0xfffff800163e2a80.
GEOM_RAID3[4]: g_raid3_event_send: Waking up 0xfffff800048bd400.
GEOM_RAID3[0]: Request failed (error=3D28). md2[WRITE(offset=3D1024, length=
=3D1024)]
GEOM_RAID3[4]: g_raid3_event_send: Sending event 0xfffff800163e2ac0.
GEOM_RAID3[4]: g_raid3_event_send: Waking up 0xfffff800048bd400.
GEOM_RAID3[0]: Request failed. raid3/graid3.rlWY7w[WRITE(offset=3D2048,
length=3D2048)]
GEOM_RAID3[3]: Running event for disk md1.
GEOM_RAID3[3]: Changing disk md1 state from ACTIVE to DISCONNECTED.
GEOM_RAID3[1]: Disk md1 state changed from ACTIVE to DISCONNECTED (device
graid3.rlWY7w).
GEOM_RAID3[0]: Device graid3.rlWY7w: provider md1 disconnected.
GEOM_RAID3[2]: Access request for raid3/graid3.rlWY7w: r-1w-1e0.
GEOM_RAID3[2]: Consumer md1 destroyed.
GEOM_RAID3[2]: Access md1 r-1w-1e-1 =3D 0
GEOM_RAID3[1]: Device graid3.rlWY7w: genid bumped to 1.
GEOM_RAID3[2]: Metadata on md0 updated.
GEOM_RAID3[2]: Metadata on md2 updated.
GEOM_RAID3[1]: Device graid3.rlWY7w state changed from COMPLETE to DEGRADED.
GEOM_RAID3[3]: Running event for disk md2.
GEOM_RAID3[3]: Changing disk md2 state from ACTIVE to DISCONNECTED.
GEOM_RAID3[1]: Disk md2 state changed from ACTIVE to DISCONNECTED (device
graid3.rlWY7w).
GEOM_RAID3[0]: Device graid3.rlWY7w: provider md2 disconnected.
GEOM_RAID3[1]: Consumer md1 destroyed.
GEOM_RAID3[2]: Consumer md2 destroyed.
GEOM_RAID3[2]: Access md2 r-1w-1e-1 =3D 0
GEOM_RAID3[0]: Device graid3.rlWY7w: provider raid3/graid3.rlWY7w destroyed.
GEOM_RAID3[2]: No I/O requests for graid3.rlWY7w, it can be destroyed.
GEOM_RAID3[2]: Metadata on md0 updated.
GEOM_RAID3[2]: Consumer md0 destroyed.
GEOM_RAID3[2]: Access md0 r-1w-1e-1 =3D 0
GEOM_RAID3[0]: Device graid3.rlWY7w destroyed.
GEOM_RAID3[1]: Thread exiting.


Fatal trap 9: general protection fault while in kernel mode
cpuid =3D 2; apic id =3D 02
instruction pointer     =3D 0x20:0xffffffff80ba77b4
stack pointer           =3D 0x28:0xfffffe00512813b0
frame pointer           =3D 0x28:0xfffffe0051281450
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                        =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 1137 (dd)
trap number             =3D 9
panic: general protection fault
cpuid =3D 2
time =3D 1561520628
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0051281=
0c0
vpanic() at vpanic+0x19d/frame 0xfffffe0051281110
panic() at panic+0x43/frame 0xfffffe0051281170
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe00512811d0
trap() at trap+0x6c/frame 0xfffffe00512812e0
calltrap() at calltrap+0x8/frame 0xfffffe00512812e0
--- trap 0x9, rip =3D 0xffffffff80ba77b4, rsp =3D 0xfffffe00512813b0, rbp =
=3D
0xfffffe0051281450 ---
_sx_xlock_hard() at _sx_xlock_hard+0x274/frame 0xfffffe0051281450
_sx_xlock() at _sx_xlock+0xc1/frame 0xfffffe0051281490
g_raid3_access() at g_raid3_access+0x11c/frame 0xfffffe00512814e0
g_access() at g_access+0x28e/frame 0xfffffe0051281550
g_dev_close() at g_dev_close+0x158/frame 0xfffffe00512815a0
devfs_close() at devfs_close+0x2e4/frame 0xfffffe0051281610
VOP_CLOSE_APV() at VOP_CLOSE_APV+0x60/frame 0xfffffe0051281630
vn_close1() at vn_close1+0xe3/frame 0xfffffe00512816a0
vn_closefile() at vn_closefile+0x4c/frame 0xfffffe0051281720
devfs_close_f() at devfs_close_f+0x2c/frame 0xfffffe0051281750
_fdrop() at _fdrop+0x1a/frame 0xfffffe0051281770
closef() at closef+0x1ec/frame 0xfffffe0051281800
fdescfree_fds() at fdescfree_fds+0x8c/frame 0xfffffe0051281850
fdescfree() at fdescfree+0x37a/frame 0xfffffe0051281910
exit1() at exit1+0x4fe/frame 0xfffffe0051281980
sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe0051281990
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0051281ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0051281ab0
--- syscall (1, FreeBSD ELF64, sys_sys_exit), rip =3D 0x8003c892a, rsp =3D
0x7fffffffd908, rbp =3D 0x7fffffffd920 ---
KDB: enter: panic
[ thread pid 1137 tid 100201 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db> x/x ticks
ticks:  7fff7475
db> x/x g_udnf_last_ticks
g_udnf_last_ticks:      7fff7472
db> x/s g_udnf_last_name
g_udnf_last_name:       md2
db> x/d g_udnf_last_tid
g_udnf_last_tid:        100193
db> x/aS g_udnf_last_stack+0x8,0x12
g_udnf_last_stack+0x8:  uma_dbg_nowait_fail_record+0x31
g_udnf_last_stack+0x10: zalloc_inject_failure+0x4c
g_udnf_last_stack+0x18: uma_zalloc_arg+0xa98
g_udnf_last_stack+0x20: mdstart_malloc+0x81d
g_udnf_last_stack+0x28: md_kthread+0x20c
g_udnf_last_stack+0x30: fork_exit+0x84
g_udnf_last_stack+0x38: fork_trampoline+0xe
g_udnf_last_stack+0x40: 0
g_udnf_last_stack+0x48: 0
g_udnf_last_stack+0x50: 0
g_udnf_last_stack+0x58: 0
g_udnf_last_stack+0x60: 0
g_udnf_last_stack+0x68: 0
g_udnf_last_stack+0x70: 0
g_udnf_last_stack+0x78: 0
g_udnf_last_stack+0x80: 0
g_udnf_last_stack+0x88: 0
g_udnf_last_stack+0x90: 0
db> x/s version
version:        FreeBSD 13.0-CURRENT #42
r349025+3bdd0fc24f5b(mnowait-dbg)-dirty: Tue Jun 25 20:34:27 PDT 2019\012=
=20=20=20
root@vali.kishkinda.net:/usr/obj/usr/src/freebsd/amd64.amd64/sys/GENERIC\012

--=20
You are receiving this mail because:
You are the assignee for the bug.=

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-238817-227>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation