From owner-freebsd-fs@freebsd.org Wed Mar 2 17:06:39 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BE56AAC0B14 for ; Wed, 2 Mar 2016 17:06:39 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 9DD5B1AC3 for ; Wed, 2 Mar 2016 17:06:39 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mailman.ysv.freebsd.org (Postfix) id 9A94FAC0B13; Wed, 2 Mar 2016 17:06:39 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A180AC0B12 for ; Wed, 2 Mar 2016 17:06:39 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-wm0-x236.google.com (mail-wm0-x236.google.com [IPv6:2a00:1450:400c:c09::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2FA8C1AC1 for ; Wed, 2 Mar 2016 17:06:39 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by mail-wm0-x236.google.com with SMTP id l68so86993470wml.1 for ; Wed, 02 Mar 2016 09:06:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sippysoft-com.20150623.gappssmtp.com; s=20150623; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=vE3/SbSn4GPw5yw19z4KMWCPS5b2RzR1FPlq8nXZwPg=; b=f77oG48xrHXYegV7dX32ycX38PixzL4XOVk/zMPdAqiLZvttu251LlVrAruJGzeigG hUd/hsUcpXBckeewqjmWB/EBqn6HfiO22+AEn4SKug91A3IhXpsxFRqie4K7iM2469G7 77tm9m6pGwV4wnZQLVAmLqSGNWRRlVqVchtk0quov4Z9XT7TTJKzjzGQPjU3YMGkg9f7 Cdv7sGKgawjfkL69sKPVL6/dH/QNd/TLbRmwWmM5giGWxjaLqVLPue19CNG1UkUV4SnK kGmFt3/16uAmCbBpeF6azANSxdQQC7GdA4BAYp5rcNadW9QHI7Bv+TKFg2/AMLvDJQ+4 E3vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=vE3/SbSn4GPw5yw19z4KMWCPS5b2RzR1FPlq8nXZwPg=; b=P0jrw9jairZCw//diZJa7VNAZGydc/b3EI/ftOhHfxIY6ov12JnjkkZNCFGReeTtEF bYlWjGdzy3ZoL8bPg+cL6WVL82XpMpQTglf9nQv85ZD5OaV9m6CYsi9wyNqR85MWnCEX /8GqERa5m+VoqH+FuVTabSHzUeiH62HtVkNHrEfQFKB5A67WbfbkSwl75VbzPUZVTuW6 fJ2VAPxzr0mrSOpEKW5nyl/koPmiYKHgquoG9vWetPGGms+6nZDLBR9JtvGTfNbb4tEH EEnjyiGYz3Zmx9nlBbvjecE/8zvcJxZEy7JJlYGTbOUUXStG0tUxK4JTKSWZJ4wY/KdH U/oA== X-Gm-Message-State: AD7BkJLKRjVZItb1dhg+pwGxJoW663f4k0asHTWgQdLeyasgPPhgwgDpZJQImOoNeH/gLZeVJG0IMZcIJMHMfyUF MIME-Version: 1.0 X-Received: by 10.28.148.16 with SMTP id w16mr1018990wmd.90.1456938397611; Wed, 02 Mar 2016 09:06:37 -0800 (PST) Sender: sobomax@sippysoft.com Received: by 10.27.218.12 with HTTP; Wed, 2 Mar 2016 09:06:37 -0800 (PST) In-Reply-To: <20160302115707.GF67250@kib.kiev.ua> References: <20160302095339.GB67250@kib.kiev.ua> <20160302115707.GF67250@kib.kiev.ua> Date: Wed, 2 Mar 2016 09:06:37 -0800 X-Google-Sender-Auth: D9JK8-J6l3nQ0qjjHKZ6cMr5PvQ Message-ID: Subject: Re: Process stuck in "vnread" From: Maxim Sobolev To: Konstantin Belousov Cc: Kirk McKusick , peter@holm.cc, fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Mar 2016 17:06:39 -0000 Konstantin, this is nullfs mounted over UFS and nullfs is pointing over to the part of the ZFS tree. I am not sure if it's what you are talking about or not. storage/builder on /builder (zfs, local, nfsv4acls) md0 vnode 3200M /builder/tmp/sspicd_tmp.ufs /dev/md0 on /builder/mnt (ufs, asynchronous, local, noatime) /builder/usr/ports-bitbucket on /builder/mnt/usr/ports (nullfs, local) So, stuck process refers to file effectively being copied over from /builder/mnt/usr/local/share/automake-1.15/compile to /builder/usr/ports-bitbucket/SOMETHING/./compile by the process chrooted into /builder/mnt, and it could be either in the read path or in the write path. However looking at the full kernel side of stack trace of that cp(1), I'd say it's probably the latter, as this would have to traverse through top level vfs/ufs first, to nullfs layer and then via zfs, none of the last two is compiled in so that there is no proper traceback. The nullfs mount is used to allow it accessing ZFS tree on the upper level, i.e. /builder/usr. Unfortunately I cannot find a way to figure out specific system call that cp got stuck in. Attempting to attach gdb causes gdb to hang in turn. So unless somebody got any other ideas on how to get some useful post-mortem debug out of this situation I'll have to restart the box soon to recover it. I will put your patch in and see if it helps. I'd also compile nullfs statically, so at least if it hits again we have some post-mortem evidence to work with. ---- (kgdb) thread 362 [Switching to thread 362 (Thread 100515)]#0 0xffffffff8095244e in sched_switch () (kgdb) bt #0 0xffffffff8095244e in sched_switch () #1 0xffffffff809313b1 in mi_switch () #2 0xffffffff8097089a in sleepq_wait () #3 0xffffffff80930dd7 in _sleep () #4 0xffffffff809b230e in bwait () #5 0xffffffff80b511f3 in vnode_pager_generic_getpages () #6 0xffffffff80dd1607 in VOP_GETPAGES_APV () #7 0xffffffff80b4f59a in vnode_pager_getpages () #8 0xffffffff80b30031 in vm_fault_hold () #9 0xffffffff80b2f797 in vm_fault () #10 0xffffffff80cb5a75 in trap_pfault () #11 0xffffffff80cb51dd in trap () #12 0xffffffff80c9b122 in calltrap () #13 0xffffffff80cb36f1 in copyin () #14 0xffffffff80977ddf in uiomove_faultflag () #15 0xffffffff819f699c in ?? () #16 0xfffffe0468a861a0 in ?? () #17 0xfffff80000000000 in ?? () #18 0xfffffe0468a861a0 in ?? () #19 0xfffff80176b39420 in ?? () #20 0x0000000000000001 in ?? () #21 0xfffff801ee76f500 in ?? () #22 0xfffffe0468a86960 in ?? () #23 0x00000001e3a72d80 in ?? () #24 0xfffff80176b39420 in ?? () #25 0xfffff803e3a72d80 in ?? () #26 0xfffffe0468a86960 in ?? () #27 0xfffff801881130e8 in ?? () #28 0xfffff801ee76f500 in ?? () #29 0x0000000000001ca5 in ?? () #30 0xfffffe0468a86200 in ?? () #31 0xffffffff819f68b2 in ?? () #32 0x0000000000001ca5 in ?? () #33 0x0000000000001ca5 in ?? () #34 0xfffff80188113000 in ?? () #35 0xfffffe0468a86960 in ?? () #36 0xfffffe0468a86440 in ?? () #37 0xffffffff81a90a77 in ?? () #38 0xfffff80100000002 in ?? () #39 0x0000000181a6c5c2 in ?? () #40 0x0000000000000000 in ?? () On Wed, Mar 2, 2016 at 3:57 AM, Konstantin Belousov wrote: > On Wed, Mar 02, 2016 at 03:02:02AM -0800, Maxim Sobolev wrote: > > About the backtrace, indeed, looks like you are right and some portion of > > it is not decoded properly, as it's loaded as a kernel module. The setup > is > > somewhat even more complicated, the /usr/ports is mounted via NULLFS, so > in > > this command: > > > > cp /usr/local/share/automake-1.15/compile ./compile > > > > The target (i.e. ./compile) here is a path on ZFS that is exported via > > NULLFS, while the source is a file on UFS2->md->ZFS. This is probably the > > reason stack trace is incomplete, both zfs.ko and nullfs.ko are loaded as > > modules and the next few frames point towards those. Unfortunately I > cannot > > beat kgdb to read symbols from those .ko's and decode them. > > Is nullfs mount put over ZFS only ? The backtrace you shown cannot > happen for ZFS, since ZFS has its own pager vop. In fact, I would > agree that the backtrace is reasonable for nullfs over UFS upper vnode. > The following patch should fix the 'paging while faulting on uiomove' > issue for nullfs over UFS. > > Peter, could you, please, test the patch ? It is purely nullfs change, > and the most interesting situation is the ups' deadlock, but the whole > set of nullfs tests would be good to check. > > diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c > index 64e1e29..49bae28 100644 > --- a/sys/fs/nullfs/null_vfsops.c > +++ b/sys/fs/nullfs/null_vfsops.c > @@ -199,7 +199,7 @@ nullfs_mount(struct mount *mp) > } > mp->mnt_kern_flag |= MNTK_LOOKUP_EXCL_DOTDOT; > mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag & > - MNTK_USES_BCACHE; > + (MNTK_USES_BCACHE | MNTK_NO_IOPF | MNTK_UNMAPPED_BUFS); > MNT_IUNLOCK(mp); > mp->mnt_data = xmp; > vfs_getnewfsid(mp); >