Date: Wed, 13 May 2020 10:45:24 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: Bryan Drewery <bdrewery@FreeBSD.org>, freebsd-current@FreeBSD.org Subject: Re: zfs deadlock on r360452 relating to busy vm page Message-ID: <889cb93b-85c7-3ec4-4ccf-5fb56ec38fa5@FreeBSD.org> In-Reply-To: <0e9cceba-84d0-ec4f-8784-36703452201d@FreeBSD.org> References: <2bdc8563-283b-32cc-8a1a-85ff52aca99e@FreeBSD.org> <e10d3c61-0b47-55f2-0fe1-9fafaafe7799@FreeBSD.org> <0e9cceba-84d0-ec4f-8784-36703452201d@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 13/05/2020 10:35, Andriy Gapon wrote: > On 13/05/2020 01:47, Bryan Drewery wrote: >> Trivial repro: >> >> dd if=/dev/zero of=blah & tail -F blah >> ^C >> load: 0.21 cmd: tail 72381 [prev->lr_read_cv] 2.17r 0.00u 0.01s 0% 2600k >> #0 0xffffffff80bce615 at mi_switch+0x155 >> #1 0xffffffff80c1cfea at sleepq_switch+0x11a >> #2 0xffffffff80b57f0a at _cv_wait+0x15a >> #3 0xffffffff829ddab6 at rangelock_enter+0x306 >> #4 0xffffffff829ecd3f at zfs_freebsd_getpages+0x14f >> #5 0xffffffff810e3ab9 at VOP_GETPAGES_APV+0x59 >> #6 0xffffffff80f349e7 at vnode_pager_getpages+0x37 >> #7 0xffffffff80f2a93f at vm_pager_get_pages+0x4f >> #8 0xffffffff80f054b0 at vm_fault+0x780 >> #9 0xffffffff80f04bde at vm_fault_trap+0x6e >> #10 0xffffffff8106544e at trap_pfault+0x1ee >> #11 0xffffffff81064a9c at trap+0x44c >> #12 0xffffffff8103a978 at calltrap+0x8 > > In r329363 I re-worked zfs_getpages and introduced range locking to it. > At the time I believed that it was safe and maybe it was, please see the commit > message. > There, indeed, have been many performance / concurrency improvements to the VM > system and r358443 is one of them. Thinking more about it, it could be r352176. I think that vm_page_grab_valid (and later vm_page_grab_valid_unlocked) are not equivalent to the code that they replaced. The original code would check valid field before any locking and it would attempt any locking / busing if a page is invalid. The object was required to be locked though. The new code tries to busy the page in any case. > I am not sure how to resolve the problem best. Maybe someone who knows the > latest VM code better than me can comment on my assumptions stated in the commit > message. > > In illumos (and, I think, in OpenZFS/ZoL) they don't have the range locking in > this corner of the code because of a similar deadlock a long time ago. > >> On 5/12/2020 3:13 PM, Bryan Drewery wrote: >>>> panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffe25eefa2e00 (find), blocked for 1802392 ticks > ... >>>> (kgdb) backtrace >>>> #0 sched_switch (td=0xfffffe255eac0000, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2147 >>>> #1 0xffffffff80bce615 in mi_switch (flags=260) at /usr/src/sys/kern/kern_synch.c:542 >>>> #2 0xffffffff80c1cfea in sleepq_switch (wchan=0xfffff810fb57dd48, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:625 >>>> #3 0xffffffff80b57f0a in _cv_wait (cvp=0xfffff810fb57dd48, lock=0xfffff80049a99040) at /usr/src/sys/kern/kern_condvar.c:146 >>>> #4 0xffffffff82434ab6 in rangelock_enter_reader (rl=0xfffff80049a99018, new=0xfffff8022cadb100) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c:429 >>>> #5 rangelock_enter (rl=0xfffff80049a99018, off=<optimized out>, len=<optimized out>, type=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_rlock.c:477 >>>> #6 0xffffffff82443d3f in zfs_getpages (vp=<optimized out>, ma=0xfffffe259f204b18, count=<optimized out>, rbehind=0xfffffe259f204ac4, rahead=0xfffffe259f204ad0) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4500 >>>> #7 zfs_freebsd_getpages (ap=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4567 >>>> #8 0xffffffff810e3ab9 in VOP_GETPAGES_APV (vop=0xffffffff8250a1e0 <zfs_vnodeops>, a=0xfffffe259f2049f0) at vnode_if.c:2644 >>>> #9 0xffffffff80f349e7 in VOP_GETPAGES (vp=<unavailable>, m=<optimized out>, count=<unavailable>, rbehind=<unavailable>, rahead=<unavailable>) at ./vnode_if.h:1171 >>>> #10 vnode_pager_getpages (object=<optimized out>, m=<optimized out>, count=<unavailable>, rbehind=<unavailable>, rahead=<unavailable>) at /usr/src/sys/vm/vnode_pager.c:743 >>>> #11 0xffffffff80f2a93f in vm_pager_get_pages (object=0xfffff806cb637c60, m=0xfffffe259f204b18, count=1, rbehind=<unavailable>, rahead=<unavailable>) at /usr/src/sys/vm/vm_pager.c:305 >>>> #12 0xffffffff80f054b0 in vm_fault_getpages (fs=<optimized out>, nera=0, behindp=<optimized out>, aheadp=<optimized out>) at /usr/src/sys/vm/vm_fault.c:1163 >>>> #13 vm_fault (map=<optimized out>, vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=<optimized out>, m_hold=<optimized out>) at /usr/src/sys/vm/vm_fault.c:1394 >>>> #14 0xffffffff80f04bde in vm_fault_trap (map=0xfffffe25653949e8, vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0, signo=0xfffffe259f204d04, ucode=0xfffffe259f204d00) at /usr/src/sys/vm/vm_fault.c:589 >>>> #15 0xffffffff8106544e in trap_pfault (frame=0xfffffe259f204d40, usermode=<optimized out>, signo=<optimized out>, ucode=<unavailable>) at /usr/src/sys/amd64/amd64/trap.c:821 >>>> #16 0xffffffff81064a9c in trap (frame=0xfffffe259f204d40) at /usr/src/sys/amd64/amd64/trap.c:340 >>>> #17 <signal handler called> >>>> #18 0x00000000002034fc in ?? () > ... >>>> (kgdb) thread >>>> [Current thread is 8 (Thread 101255)] >>>> (kgdb) backtrace >>>> #0 sched_switch (td=0xfffffe25c8e9bc00, flags=<optimized out>) at /usr/src/sys/kern/sched_ule.c:2147 >>>> #1 0xffffffff80bce615 in mi_switch (flags=260) at /usr/src/sys/kern/kern_synch.c:542 >>>> #2 0xffffffff80c1cfea in sleepq_switch (wchan=0xfffffe001cbca850, pri=84) at /usr/src/sys/kern/subr_sleepqueue.c:625 >>>> #3 0xffffffff80f1de50 in _vm_page_busy_sleep (obj=<optimized out>, m=0xfffffe001cbca850, pindex=<optimized out>, wmesg=<optimized out>, allocflags=21504, locked=false) at /usr/src/sys/vm/vm_page.c:1094 >>>> #4 0xffffffff80f241f7 in vm_page_grab_sleep (object=0xfffff806cb637c60, m=<optimized out>, pindex=<optimized out>, wmesg=<optimized out>, allocflags=21504, locked=<error reading variable: Cannot access memory at address 0x0>) at /usr/src/sys/vm/vm_page.c:4326 >>>> #5 vm_page_acquire_unlocked (object=0xfffff806cb637c60, pindex=1098, prev=<optimized out>, mp=0xfffffe2717fc6730, allocflags=21504) at /usr/src/sys/vm/vm_page.c:4469 >>>> #6 0xffffffff80f24c61 in vm_page_grab_valid_unlocked (mp=0xfffffe2717fc6730, object=0xfffff806cb637c60, pindex=1098, allocflags=21504) at /usr/src/sys/vm/vm_page.c:4645 >>>> #7 0xffffffff82440246 in page_busy (vp=0xfffff80571f29500, start=4497408, off=<optimized out>, nbytes=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:414 >>>> #8 update_pages (vp=0xfffff80571f29500, start=4497408, len=32, os=0xfffff8096a277400, oid=2209520, segflg=<optimized out>, tx=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:482 >>>> #9 zfs_write (vp=<optimized out>, uio=<optimized out>, ioflag=0, cr=<optimized out>, ct=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1071 >>>> #10 zfs_freebsd_write (ap=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4838 >>>> #11 0xffffffff810e0eaf in VOP_WRITE_APV (vop=0xffffffff8250a1e0 <zfs_vnodeops>, a=0xfffffe2717fc68c8) at vnode_if.c:925 >>>> #12 0xffffffff80cb574c in VOP_WRITE (vp=0xfffff80571f29500, uio=0xfffffe2717fc6bb0, ioflag=8323073, cred=<optimized out>) at ./vnode_if.h:413 >>>> #13 vn_write (fp=0xfffff8048195e8c0, uio=<optimized out>, active_cred=<optimized out>, flags=<optimized out>, td=<optimized out>) at /usr/src/sys/kern/vfs_vnops.c:894 >>>> #14 0xffffffff80cb50c3 in vn_io_fault_doio (args=0xfffffe2717fc6af0, uio=0xfffffe2717fc6bb0, td=0xfffffe25c8e9bc00) at /usr/src/sys/kern/vfs_vnops.c:959 >>>> #15 0xffffffff80cb1c8c in vn_io_fault1 (vp=<optimized out>, uio=0xfffffe2717fc6bb0, args=0xfffffe2717fc6af0, td=0xfffffe25c8e9bc00) at /usr/src/sys/kern/vfs_vnops.c:1077 >>>> #16 0xffffffff80cafa32 in vn_io_fault (fp=0xfffff8048195e8c0, uio=0xfffffe2717fc6bb0, active_cred=0xfffff80f2cc12708, flags=0, td=<optimized out>) at /usr/src/sys/kern/vfs_vnops.c:1181 >>>> #17 0xffffffff80c34331 in fo_write (fp=0xfffff8048195e8c0, uio=0xfffffe2717fc6bb0, active_cred=<unavailable>, flags=<unavailable>, td=0xfffffe25c8e9bc00) at /usr/src/sys/sys/file.h:326 >>>> #18 dofilewrite (td=0xfffffe25c8e9bc00, fd=2, fp=0xfffff8048195e8c0, auio=0xfffffe2717fc6bb0, offset=<optimized out>, flags=<optimized out>) at /usr/src/sys/kern/sys_generic.c:564 >>>> #19 0xffffffff80c33eb0 in kern_writev (td=0xfffffe25c8e9bc00, fd=2, auio=<optimized out>) at /usr/src/sys/kern/sys_generic.c:491 >>>> #20 sys_write (td=0xfffffe25c8e9bc00, uap=<optimized out>) at /usr/src/sys/kern/sys_generic.c:406 >>>> #21 0xffffffff8106623d in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:150 >>>> #22 amd64_syscall (td=0xfffffe25c8e9bc00, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1161 >>>> #23 <signal handler called> >>>> #24 0x000000080043d53a in ?? () >>> >>> Maybe r358443 is related? >>> >>> >>>> (kgdb) frame 4 >>>> #4 0xffffffff80f241f7 in vm_page_grab_sleep (object=0xfffff806cb637c60, m=<optimized out>, pindex=<optimized out>, wmesg=<optimized out>, allocflags=21504, locked=<error reading variable: Cannot access memory at address 0x0>) at /usr/src/sys/vm/vm_page.c:4326 >>>> 4326 if (_vm_page_busy_sleep(object, m, m->pindex, wmesg, allocflags, >>>> (kgdb) p *object >>>> $8 = {lock = {lock_object = {lo_name = 0xffffffff8114fa30 "vm object", lo_flags = 627245056, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list = {tqe_next = 0xfffff806cb637d68, tqe_prev = 0xfffff806cb637b78}, shadow_head = {lh_first = 0x0}, shadow_list = {le_next = 0xffffffffffffffff, >>>> le_prev = 0xffffffffffffffff}, memq = {tqh_first = 0xfffffe001cbca850, tqh_last = 0xfffffe001cbca860}, rtree = {rt_root = 18446741875168421969}, size = 1099, domain = {dr_policy = 0x0, dr_iter = 0}, generation = 1, cleangeneration = 1, ref_count = 2, shadow_count = 0, memattr = 6 '\006', type = 2 '\002', >>>> flags = 4096, pg_color = 0, paging_in_progress = {__count = 2}, busy = {__count = 0}, resident_page_count = 1, backing_object = 0x0, backing_object_offset = 0, pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0}, handle = 0xfffff80571f29500, un_pager = {vnp = {vnp_size = 4499568, >>>> writemappings = 0}, devp = {devp_pglist = {tqh_first = 0x44a870, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp = {sgp_pglist = {tqh_first = 0x44a870, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x44a870, swp_blks = {pt_root = 0}, writemappings = 0}}, cred = 0x0, charge = 0, umtx_data = 0x0} >>>> (kgdb) frame 5 >>>> #5 vm_page_acquire_unlocked (object=0xfffff806cb637c60, pindex=1098, prev=<optimized out>, mp=0xfffffe2717fc6730, allocflags=21504) at /usr/src/sys/vm/vm_page.c:4469 >>>> 4469 if (!vm_page_grab_sleep(object, m, pindex, "pgnslp", >>>> (kgdb) p *m >>>> $9 = {plinks = {q = {tqe_next = 0xffffffffffffffff, tqe_prev = 0xffffffffffffffff}, s = {ss = {sle_next = 0xffffffffffffffff}}, memguard = {p = 18446744073709551615, v = 18446744073709551615}, uma = {slab = 0xffffffffffffffff, zone = 0xffffffffffffffff}}, listq = {tqe_next = 0x0, tqe_prev = 0xfffff806cb637ca8}, >>>> object = 0xfffff806cb637c60, pindex = 1098, phys_addr = 18988408832, md = {pv_list = {tqh_first = 0x0, tqh_last = 0xfffffe001cbca888}, pv_gen = 44682, pat_mode = 6}, ref_count = 2147483648, busy_lock = 1588330502, a = {{flags = 0, queue = 255 '\377', act_count = 0 '\000'}, _bits = 16711680}, order = 13 '\r', >>>> pool = 0 '\000', flags = 1 '\001', oflags = 0 '\000', psind = 0 '\000', segind = 6 '\006', valid = 0 '\000', dirty = 0 '\000'} >>> >>> Pretty sure this thread is holding the rangelock from zfs_write() that >>> tail is waiting on. So what is this thread (101255) waiting on exactly >>> for? I'm not sure the way to track down what is using vm object >>> 0xfffff806cb637c60. If the tail thread busied the page then they are >>> waiting on each other I guess. If that's true then r358443 removing the >>> write lock on the object in update_pages() could be a problem. >>> >>> >>> Not sure the rest is interesting. I think they are just waiting on the >>> locked vnode but I give it here in case I missed something. > > -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?889cb93b-85c7-3ec4-4ccf-5fb56ec38fa5>