From owner-freebsd-stable@freebsd.org Mon Mar 28 15:58:31 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 33224AE031E for ; Mon, 28 Mar 2016 15:58:31 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 100D51B47 for ; Mon, 28 Mar 2016 15:58:31 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mailman.ysv.freebsd.org (Postfix) id 0B782AE031B; Mon, 28 Mar 2016 15:58:31 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B0E6AE0319 for ; Mon, 28 Mar 2016 15:58:31 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-ob0-x229.google.com (mail-ob0-x229.google.com [IPv6:2607:f8b0:4003:c01::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C22611B43 for ; Mon, 28 Mar 2016 15:58:30 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mail-ob0-x229.google.com with SMTP id fp4so102848237obb.2 for ; Mon, 28 Mar 2016 08:58:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sippysoft-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=lMdhA966Ck33pg7Pw25CvG+6Xr1DUjkVogO2iUcdE/4=; b=NELwqPa9ggOYVbTvWs1yi7fx7TikFBqo8ASy9+/ShrJ2JAooVhFddKhveVmC2JtK7m jC211IH7vDF1mshR+rDUISCqA0c08zCAFYL4VbPrbeIEytb8JlRGhLgSmQM1m91HGAGJ hoN20CyGZ/1NH5yGG6V9nyfsgt9jCbLBH7HupoM4gybhGFlo1DGXin4bXCUhdjAyMUmW vDuLrbMRyyaBqT7Y6rthotKCycmc7dqfIU6H+p7NACaCNSPapkWBupgweZbDikkoi94Z o835x8qW+ceWcPR//vuOfWI+jrpQMEVyfe0zAJno+f9tWaj+CJxM5xrIuJy4zVQqi6Vd HpXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=lMdhA966Ck33pg7Pw25CvG+6Xr1DUjkVogO2iUcdE/4=; b=i8DwmxRhoaNbWfK5AItf4u63hmYp0qsyopMoi5oIUOn4XNoYPrEViB2QMl253C1ORS a30ajXQTUDMbLdEOQSzWgX/8XU+af3oy+JCgKUX/pUax8zxWF0TaYRPtKZZtjGtL2j0L v9LZeEKNmoUh34Ij7tIxXB2kBcNVRrqRuDoMvuBhuC9rlzBMGWSm8MbLqaF2GXbBhVrS qLtTP0Yki7Gud6aykxpC4i5nqS6FPOjcRJioWijpp0dJrOkSA/NK2a1yZFAc4HkswEyX U7rz6Z+fdT97CFYq56H5UOHWtscxEykW3F3Dd6OgZZEPg9RhxZZjpZaLpiZdzLpczzxc /w9Q== X-Gm-Message-State: AD7BkJJ0ZKthDMZrIzT7t07hgeoOTu+rmpZuCB1oEijRPTRIamhNvusHJ6KqeQNXMUDTwLRyXSgLm7dR1vsuamZf MIME-Version: 1.0 X-Received: by 10.60.60.3 with SMTP id d3mr8771616oer.66.1459180709858; Mon, 28 Mar 2016 08:58:29 -0700 (PDT) Sender: sobomax@sippysoft.com Received: by 10.157.11.143 with HTTP; Mon, 28 Mar 2016 08:58:29 -0700 (PDT) In-Reply-To: References: Date: Mon, 28 Mar 2016 08:58:29 -0700 X-Google-Sender-Auth: Vq_9D_7iSv26-dxjhMqJjv_-5tI Message-ID: Subject: Re: Process stuck in "vnread" From: Maxim Sobolev To: stable@freebsd.org, freebsd-fs@freebsd.org, Pawel Jakub Dawidek Cc: Kirk McKusick , kib@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Mar 2016 15:58:31 -0000 P.S. That being said, I am not sure if that write operation on md(4) happen before or after the offending lockup in the zfs_freebsd_write(), so it might be just as well be the result of that, not cause. On Mon, Mar 28, 2016 at 8:52 AM, Maxim Sobolev wrote: > Done some head scratching, it looks like it's got page fault in the > copyin() (cp(1) AFAIK mmaps source file). There might be some interlock > issue between competing write to the same ZFS, the md0 device is locked > forever waiting for the write operation to complete at the very same time. > I am curious as to whether we are allowed to sleep in the dmu_write_uio_dbuf(), > AFAIK dmu is ZFS's transaction layer, so maybe copyin() should be done > earlier to avoid possible page fault in there? > > (kgdb) thread 465 > [Switching to thread 465 (Thread 100530)]#0 sched_switch > (td=0xfffff8037b8ff960, newtd=, flags= out>) at /usr/home/sobomax/projects/freebsd103/sys/kern/sched_ule.c:1945 > 1945 cpuid = PCPU_GET(cpuid); > (kgdb) bt > #0 sched_switch (td=0xfffff8037b8ff960, newtd=, > flags=) at > /usr/home/sobomax/projects/freebsd103/sys/kern/sched_ule.c:1945 > #1 0xffffffff809364d1 in mi_switch (flags=260, newtd=0x0) at > /usr/home/sobomax/projects/freebsd103/sys/kern/kern_synch.c:491 > #2 0xffffffff809761ca in sleepq_wait (wchan=0x0, pri=0) at > /usr/home/sobomax/projects/freebsd103/sys/kern/subr_sleepqueue.c:618 > #3 0xffffffff808d843d in _cv_wait (cvp=0xfffff8001041b220, > lock=0xfffff8001041b1b8) at > /usr/home/sobomax/projects/freebsd103/sys/kern/kern_condvar.c:151 > #4 0xffffffff81a431e5 in txg_wait_synced () from /boot/kernel/zfs.ko > #5 0xffffffff81a62fa9 in zil_commit () from /boot/kernel/zfs.ko > #6 0xffffffff81a9216b in zfs_freebsd_write () from /boot/kernel/zfs.ko > #7 0xffffffff80dd6219 in VOP_WRITE_APV (vop=, > a=) at vnode_if.c:999 > #8 0xffffffff8060b69a in mdstart_vnode (sc=0xfffff80057f78000, > bp=0xfffff8036d4660f8) at vnode_if.h:413 > #9 0xffffffff8060caad in md_kthread (arg=0xfffff80057f78000) at > /usr/home/sobomax/projects/freebsd103/sys/dev/md/md.c:1147 > #10 0xffffffff808f6e6a in fork_exit (callout=0xffffffff8060c900 > , arg=0xfffff80057f78000, frame=0xfffffe04689b2ac0) at > /usr/home/sobomax/projects/freebsd103/sys/kern/kern_fork.c:1027 > #11 0xffffffff80ca0a5e in fork_trampoline () at > /usr/home/sobomax/projects/freebsd103/sys/amd64/amd64/exception.S:611 > #12 0x0000000000000000 in ?? () > > > On Mon, Mar 28, 2016 at 7:31 AM, Maxim Sobolev > wrote: > >> OK, that happened again. Now it's 10.3-RC3, funny enough, it's the same >> "cp" command. Any ideas about what can be wrong here? The box is still up, >> so if you need me to do something specific inside kgdb please let me know. >> Thanks! >> >> $ uname -a >> FreeBSD abc01.sippysoft.com 10.3-RC3 FreeBSD 10.3-RC3 #0 >> bedc749(master): Wed Mar 23 14:39:07 PDT 2016 >> sobomax@abc01.sippysoft.com:/usr/obj/usr/home/sobomax/projects/freebsd103/sys/ABC01 >> amd64 >> >> >> $ ps -xaHl -O lwp >> UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME >> COMMAND PID LWP TT STAT TIME COMMAND >> 0 39143 38949 0 52 0 12368 1064 vnread DL+ 2 0:00.00 >> cp /usr/local/sh 39143 101176 2 DL+ 0:00.00 cp >> /usr/local/share/automake-1.15/compile ./compile >> >> >> (kgdb) thread 473 >> [Switching to thread 473 (Thread 101176)]#0 sched_switch >> (td=0xfffff803320b84b0, newtd=, flags=> out>) >> at /usr/home/sobomax/projects/freebsd103/sys/kern/sched_ule.c:1945 >> 1945 cpuid = PCPU_GET(cpuid); >> (kgdb) bt >> #0 sched_switch (td=0xfffff803320b84b0, newtd=, >> flags=) >> at /usr/home/sobomax/projects/freebsd103/sys/kern/sched_ule.c:1945 >> #1 0xffffffff809364d1 in mi_switch (flags=260, newtd=0x0) at >> /usr/home/sobomax/projects/freebsd103/sys/kern/kern_synch.c:491 >> #2 0xffffffff809761ca in sleepq_wait (wchan=0x0, pri=0) at >> /usr/home/sobomax/projects/freebsd103/sys/kern/subr_sleepqueue.c:618 >> #3 0xffffffff80935ef7 in _sleep (ident=, >> lock=, priority=, wmesg=> optimized out>, >> sbt=, pr=, flags=> optimized out>) at >> /usr/home/sobomax/projects/freebsd103/sys/kern/kern_synch.c:255 >> #4 0xffffffff809b7c6e in bwait (bp=0xfffffe03e1012048, pri=> optimized out>, wchan=0xffffffff81069471 "vnread") >> at /usr/home/sobomax/projects/freebsd103/sys/kern/vfs_bio.c:4431 >> #5 0xffffffff80b56663 in vnode_pager_generic_getpages >> (vp=0xfffff802b7e73588, m=0xfffffe0468c4ebe0, bytecount=> out>, >> reqpage=) at >> /usr/home/sobomax/projects/freebsd103/sys/vm/vnode_pager.c:907 >> #6 0xffffffff80dd7fa7 in VOP_GETPAGES_APV (vop=, >> a=) at vnode_if.c:2795 >> #7 0xffffffff80b54a0a in vnode_pager_getpages (object=> out>, m=0xfffffe0468c4ebe0, count=, reqpage=0) >> at vnode_if.h:1154 >> #8 0xffffffff80b354a1 in vm_fault_hold (map=0xfffff800294c98c0, >> vaddr=34366300160, fault_type=, fault_flags=0, >> m_hold=0x0) >> at vm_pager.h:127 >> #9 0xffffffff80b34c07 in vm_fault (map=0xfffff800294c98c0, vaddr=> optimized out>, fault_type=1 '\001', fault_flags=0) >> at /usr/home/sobomax/projects/freebsd103/sys/vm/vm_fault.c:273 >> #10 0xffffffff80cbaef5 in trap_pfault (frame=0xfffffe0468c4f040, >> usermode=0) at >> /usr/home/sobomax/projects/freebsd103/sys/amd64/amd64/trap.c:747 >> #11 0xffffffff80cba65d in trap (frame=0xfffffe0468c4f040) at >> /usr/home/sobomax/projects/freebsd103/sys/amd64/amd64/trap.c:447 >> #12 0xffffffff80ca0592 in calltrap () at >> /usr/home/sobomax/projects/freebsd103/sys/amd64/amd64/exception.S:236 >> #13 0xffffffff80cb8b71 in copyin () at >> /usr/home/sobomax/projects/freebsd103/sys/amd64/amd64/support.S:311 >> #14 0xffffffff8097d70f in uiomove_faultflag (cp=, >> n=, uio=0xfffffe0468c4f960, nofault=> out>) >> at /usr/home/sobomax/projects/freebsd103/sys/kern/subr_uio.c:253 >> ---Type to continue, or q to quit--- >> #15 0xffffffff819f799c in dmu_write_uio_dnode () from /boot/kernel/zfs.ko >> #16 0xffffffff819f78b2 in dmu_write_uio_dbuf () from /boot/kernel/zfs.ko >> #17 0xffffffff81a91a07 in zfs_freebsd_write () from /boot/kernel/zfs.ko >> #18 0xffffffff80dd6219 in VOP_WRITE_APV (vop=, >> a=) at vnode_if.c:999 >> #19 0xffffffff808730b2 in null_bypass (ap=0xfffffe0468c4f7d0) at >> /usr/home/sobomax/projects/freebsd103/sys/fs/nullfs/null_vnops.c:280 >> #20 0xffffffff80dd621f in VOP_WRITE_APV (vop=, >> a=) at vnode_if.c:1001 >> #21 0xffffffff809dea28 in vn_write (fp=0xfffff8004bc0ddc0, >> uio=0xfffffe0468c4f960, active_cred=0x1ca5, flags=916, >> td=0xfffffe00006a60d0) >> at vnode_if.h:413 >> #22 0xffffffff809dad6b in vn_io_fault (fp=0xfffff8004bc0ddc0, >> uio=0xfffffe0468c4f960, active_cred=0x1ca5, flags=0, td=0xfffffe00006a60d0) >> at /usr/home/sobomax/projects/freebsd103/sys/kern/vfs_vnops.c:1225 >> #23 0xffffffff80981937 in dofilewrite (td=0xfffff803320b84b0, fd=4, >> fp=0xfffff8004bc0ddc0, auio=0xfffffe0468c4f960, offset=> out>, >> flags=0) at file.h:305 >> #24 0xffffffff80981668 in kern_writev (td=0xfffff803320b84b0, fd=4, >> auio=0xfffffe0468c4f960) >> at /usr/home/sobomax/projects/freebsd103/sys/kern/sys_generic.c:481 >> #25 0xffffffff809815f3 in sys_write (td=, uap=> optimized out>) >> at /usr/home/sobomax/projects/freebsd103/sys/kern/sys_generic.c:396 >> #26 0xffffffff80cbb559 in amd64_syscall (td=0xfffff803320b84b0, traced=0) >> at subr_syscall.c:141 >> #27 0xffffffff80ca087b in Xfast_syscall () at >> /usr/home/sobomax/projects/freebsd103/sys/amd64/amd64/exception.S:396 >> #28 0x0000000800968f5a in ?? () >> >> >> On Wed, Mar 2, 2016 at 1:12 AM, Maxim Sobolev >> wrote: >> >>> Hi, I've encountered cp(1) process stuck in the vnread state on one of >>> my build machines that got recently upgraded to 10.3. >>> >>> 0 79596 1 0 20 0 17092 1396 wait I 1 >>> 0:00.00 /bin/sh /usr/local/bin/autoreconf -f -i >>> 0 79602 79596 0 52 0 41488 9036 wait I 1 >>> 0:00.07 /usr/local/bin/perl -w /usr/local/bin/autoreconf-2.69 -f -i >>> 0 79639 79602 0 72 0 0 0 - Z 1 >>> 0:00.27 >>> 0 79762 79602 0 20 0 17092 1396 wait I 1 >>> 0:00.00 /bin/sh /usr/local/bin/automake --add-missing --copy --force-missing >>> 0 79768 79762 0 52 0 49736 13936 wait I 1 >>> 0:00.11 /usr/local/bin/perl -w /usr/local/bin/automake-1.15 --add-missing >>> --copy --force-missing >>> 0 79962 79768 0 20 0 12368 1024 vnread DL 1 >>> 0:00.00 cp /usr/local/share/automake-1.15/compile ./compile >>> >>> I am not sure if it's related to that OS version upgrade, but I have not >>> seen any such issues on the same machine in 2-3 years running essentially >>> the same build process with version 9.x, 10.0, 10.1 and 10.2. >>> >>> $ uname -a >>> FreeBSD van01.sippysoft.com 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #1 >>> 80de3e2(master)-dirty: Tue Feb 2 12:19:57 PST 2016 >>> sobomax@abc.sippysoft.com:/usr/obj/usr/home/sobomax/projects/freebsd103/sys/ABC >>> amd64 >>> >>> The kernel stack trace is: >>> >>> (kgdb) thread 360 >>> [Switching to thread 360 (Thread 100515)]#0 0xffffffff8095244e in >>> sched_switch () >>> (kgdb) bt >>> #0 0xffffffff8095244e in sched_switch () >>> #1 0xffffffff809313b1 in mi_switch () >>> #2 0xffffffff8097089a in sleepq_wait () >>> #3 0xffffffff80930dd7 in _sleep () >>> #4 0xffffffff809b230e in bwait () >>> #5 0xffffffff80b511f3 in vnode_pager_generic_getpages () >>> #6 0xffffffff80dd1607 in VOP_GETPAGES_APV () >>> #7 0xffffffff80b4f59a in vnode_pager_getpages () >>> #8 0xffffffff80b30031 in vm_fault_hold () >>> #9 0xffffffff80b2f797 in vm_fault () >>> #10 0xffffffff80cb5a75 in trap_pfault () >>> #11 0xffffffff80cb51dd in trap () >>> #12 0xffffffff80c9b122 in calltrap () >>> #13 0xffffffff80cb36f1 in copyin () >>> #14 0xffffffff80977ddf in uiomove_faultflag () >>> >>> The FS stack configuration is somewhat unique, so I am not sure if I am >>> hitting some rare race condition or lock ordering issues specific to that. >>> It's basically ZFS (ZRAID) on top of pair or SATA SSDs with big file on >>> that FS attached via md(4) and UFS2 on that md(4). The build itself runs in >>> chroot with that UFS2 fs as its primary root. >>> >>> Just maybe additional bit of info, attempting to list the directory with >>> that UFS image also got my bash process stuck in "zfs" state, backtrace >>> from that is: >>> >>> (kgdb) thread 353 >>> [Switching to thread 353 (Thread 100508)]#0 0xffffffff8095244e in >>> sched_switch () >>> (kgdb) bt >>> #0 0xffffffff8095244e in sched_switch () >>> #1 0xffffffff809313b1 in mi_switch () >>> #2 0xffffffff8097089a in sleepq_wait () >>> #3 0xffffffff809069ad in sleeplk () >>> #4 0xffffffff809060e0 in __lockmgr_args () >>> #5 0xffffffff809b8b7c in vop_stdlock () >>> #6 0xffffffff80dd0a3b in VOP_LOCK1_APV () >>> #7 0xffffffff809d6d23 in _vn_lock () >>> #8 0xffffffff81a8c9cd in ?? () >>> #9 0x0000000000000000 in ?? () >>> >>> >>> >> >> >> -- >> Maksym Sobolyev >> Sippy Software, Inc. >> Internet Telephony (VoIP) Experts >> Tel (Canada): +1-778-783-0474 >> Tel (Toll-Free): +1-855-747-7779 >> Fax: +1-866-857-6942 >> Web: http://www.sippysoft.com >> MSN: sales@sippysoft.com >> Skype: SippySoft >> > > > > -- > Maksym Sobolyev > Sippy Software, Inc. > Internet Telephony (VoIP) Experts > Tel (Canada): +1-778-783-0474 > Tel (Toll-Free): +1-855-747-7779 > Fax: +1-866-857-6942 > Web: http://www.sippysoft.com > MSN: sales@sippysoft.com > Skype: SippySoft >