From owner-freebsd-bugs@freebsd.org Tue Jun 30 20:33:39 2020 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id B6D9D355A69 for ; Tue, 30 Jun 2020 20:33:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 49xGKH4TwJz3gC5 for ; Tue, 30 Jun 2020 20:33:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id 9995C355C1D; Tue, 30 Jun 2020 20:33:39 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 995DE355AC8 for ; Tue, 30 Jun 2020 20:33:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 49xGKH3fB3z3gQq for ; Tue, 30 Jun 2020 20:33:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 6134F1D968 for ; Tue, 30 Jun 2020 20:33:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 05UKXdBo042630 for ; Tue, 30 Jun 2020 20:33:39 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 05UKXdZG042629 for bugs@FreeBSD.org; Tue, 30 Jun 2020 20:33:39 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 247668] Page fault in zfsctl_snapdir_getattr Date: Tue, 30 Jun 2020 20:33:39 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.1-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: asomers@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Jun 2020 20:33:39 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D247668 Bug ID: 247668 Summary: Page fault in zfsctl_snapdir_getattr Product: Base System Version: 12.1-STABLE Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: asomers@FreeBSD.org On a very heavily loaded server I observed the following kernel-mode page fault. The offending process was a "procstat -af", which did VOP_GETATTR on every open file descriptor on the whole system, including the .zfs/snapshot directories. On one of those, it called dsl_dataset_phys, which tried to dereference a null pointer. There were also 5 "zfs destroy" processes, and dozens of "zfs list" and "zfs recv" running concurrently. I suspect that zfsctl_snapdir_getattr is missing some lock when it checks dsl_dataset_phys, while trying to calculate the directory's nlink attribute= .=20 But it's not clear what lock it ought to hold. It's worth noting that ZoL doesn't have this problem because it doesn't even try to calculate nlink; instead it always returns "2". Sadly, I haven't been able to reproduce the issue on any non-production machine.=20=20 The server in question is running 12-STABLE at svn r346022. #1 doadump (textdump=3D) at /usr/src/sys/kern/kern_shutdown= .c:371 #2 0xffffffff80bbe655 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff80bbea96 in vpanic (fmt=3D, ap=3D) at /usr/src/sys/kern/kern_shutdown.c:880 #4 0xffffffff80bbe8b3 in panic (fmt=3D) at /usr/src/sys/kern/kern_shutdown.c:807 #5 0xffffffff81090310 in trap_fatal (frame=3D0xfffffe04b95c08a0, eva=3D24) at /usr/src/sys/amd64/amd64/trap.c:925 #6 0xffffffff8109035f in trap_pfault (frame=3D0xfffffe04b95c08a0, usermode=3D, signo=3D, ucode=3D) at /usr/src/sys/amd64/amd64/trap.c:743 #7 0xffffffff8108f9b8 in trap (frame=3D0xfffffe04b95c08a0) at /usr/src/sys/amd64/amd64/trap.c:407 #8 #9 0xffffffff825f4cbc in dsl_dataset_phys (ds=3D0xfffff86821e72e10) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dsl_dataset.h:2= 57 #10 zfsctl_snapdir_getattr (ap=3D) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:1133 #11 0xffffffff81211315 in VOP_GETATTR_APV ( vop=3D0xffffffff826be060 , a=3D0xfffffe04b95c0a98) at vnode_if.c:733 #12 0xffffffff80c7bd29 in VOP_GETATTR (vp=3D0x1, vap=3D, cred=3D0xfffff88e58a45700) at ./vnode_if.h:309 #13 vop_stdvptocnp (ap=3D) at /usr/src/sys/kern/vfs_default.= c:743 #14 0xffffffff8121495b in VOP_VPTOCNP_APV ( vop=3D0xffffffff81b281b8 , a=3D0xfffffe04b95c0d90) at vnode_if.c:3718 #15 0xffffffff80c78304 in VOP_VPTOCNP (vp=3D0x0, vpp=3D, cred=3D0xfffff88e58a45700, buf=3D0xfffff86ed5d7d400 "", buflen=3D0xfffffe04b95c0e34) at ./vnode_if.h:1599 #16 vn_vptocnp (vp=3D0xfffffe04b95c0e28, cred=3D, buf=3D, buflen=3D) at /usr/src/sys/kern/vfs_cache.c:2296 #17 0xffffffff80c77db7 in vn_fullpath1 (td=3D0xfffff865848d7000, vp=3D0xfffff80e4a8a53c0, rdir=3D0xfffff860440f0b40, buf=3D0xfffff86ed5d= 7d400 "", retbuf=3D0xfffffe04b95c0fa8, buflen=3D1023) at /usr/src/sys/kern/vfs_cache.c:2392 #18 0xffffffff80c780f8 in vn_fullpath (td=3D0xfffff865848d7000, vn=3D0xfffff80e4a8a53c0, retbuf=3D0xfffff865848d75a0, freebuf=3D0xfffffe04b95c0fb0) at /usr/src/sys/kern/vfs_cache.c:2221 #19 0xffffffff80ca0635 in vn_fill_kinfo_vnode (vp=3D0xfffff80e4a8a53c0, kif=3D0xfffff831bcf5e818) at /usr/src/sys/kern/vfs_vnops.c:2352 #20 0xffffffff80c9d3f6 in vn_fill_kinfo (fp=3D, kif=3D0xfffff831bcf5e818, fdp=3D) at /usr/src/sys/kern/vfs_vnops.c:2318 #21 0xffffffff80b6ca25 in fo_fill_kinfo (fp=3D, kif=3D, fdp=3D) at /usr/src/sys/sys/file.= h:407 #22 export_file_to_kinfo (fp=3D, fd=3D, rightsp=3D, kif=3D, fdp=3D0xfffff86618252= 450, flags=3D1) at /usr/src/sys/kern/kern_descrip.c:3494 #23 export_file_to_sb (fp=3D0xfffff8210a788460, fd=3D4, rightsp=3D, efbuf=3D) at /usr/src/sys/kern/kern_descrip.c:3560 #24 kern_proc_filedesc_out (p=3D, sb=3D, maxlen=3D, flags=3D-1124734960) at /usr/src/sys/kern/kern_descrip.c:3667 #25 0xffffffff80b6dbbd in sysctl_kern_proc_filedesc (oidp=3D, arg1=3D0xfffffe04b95c12bc, arg2=3D, req=3D) at /usr/src/sys/kern/kern_descrip.c:3701 #26 0xffffffff80bcd639 in sysctl_root_handler_locked ( oid=3D0xffffffff81b0a760 , arg1=3D0xfffffe04b95c12bc, arg2=3D1, req=3D0xfffffe04b95c11f0, tracker=3D0xfffffe04b95c1168) at /usr/src/sys/kern/kern_sysctl.c:166 #27 0xffffffff80bcccf9 in sysctl_root (oidp=3D, arg1=3D0xfffffe04b95c12bc, arg2=3D1, req=3D0xfffffe04b95c11f0) at /usr/src/sys/kern/kern_sysctl.c:2062 #28 0xffffffff80bcd368 in userland_sysctl (td=3D0xfffff865848d7000, name=3D0xfffffe04b95c12b0, namelen=3D4, old=3D, oldlenp=3D, inkernel=3D, new=3D0x0, newle= n=3D0, retval=3D0xfffffe04b95c1318, flags=3D0) at /usr/src/sys/kern/kern_sysct= l.c:2157 #29 0xffffffff80bcd1af in sys___sysctl (td=3D0xfffff865848d7000, uap=3D0xfffff865848d73c0) at /usr/src/sys/kern/kern_sysctl.c:2092 #30 0xffffffff81090e87 in syscallenter (td=3D0xfffff865848d7000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #31 amd64_syscall (td=3D0xfffff865848d7000, traced=3D0) at /usr/src/sys/amd64/amd64/trap.c:1168 #32 #33 0x000000080045789a in ?? () --=20 You are receiving this mail because: You are the assignee for the bug.=