From owner-freebsd-fs@freebsd.org Fri Jun 17 03:54:34 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD880A77F5B for ; Fri, 17 Jun 2016 03:54:34 +0000 (UTC) (envelope-from zanchey@ucc.gu.uwa.edu.au) Received: from mail-ext-sout1.uwa.edu.au (mail-ext-sout1.uwa.edu.au [130.95.128.72]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "IronPort Appliance Demo Certificate", Issuer "IronPort Appliance Demo Certificate" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 55B282B22 for ; Fri, 17 Jun 2016 03:54:32 +0000 (UTC) (envelope-from zanchey@ucc.gu.uwa.edu.au) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AFBQA4c2NX/8+AX4JehRGmAwEBAQEBAQaWRxeHewEBAQEBAWYnhQwGAQE4gQxEiDCvF4UpAQEFiGyDSgiFX4JHhleBA4F9C0CCR45sigqBMZwaj3VUgUNFHIFZYQGJegEBAQ X-IPAS-Result: A2AFBQA4c2NX/8+AX4JehRGmAwEBAQEBAQaWRxeHewEBAQEBAWYnhQwGAQE4gQxEiDCvF4UpAQEFiGyDSgiFX4JHhleBA4F9C0CCR45sigqBMZwaj3VUgUNFHIFZYQGJegEBAQ X-IronPort-AV: E=Sophos;i="5.26,481,1459785600"; d="scan'208";a="222815008" Received: from f5-new.net.uwa.edu.au (HELO mooneye.ucc.gu.uwa.edu.au) ([130.95.128.207]) by mail-ext-out1.uwa.edu.au with ESMTP/TLS/ADH-AES256-SHA; 17 Jun 2016 11:53:20 +0800 Received: by mooneye.ucc.gu.uwa.edu.au (Postfix, from userid 801) id 4AD7A3C057; Fri, 17 Jun 2016 11:53:19 +0800 (AWST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ucc.gu.uwa.edu.au; s=ucc-2016-3; t=1466135599; bh=HDmi0WXGrRaTu1ndD7mBJW9u7NfO5rbcYGlwGKTpaFY=; h=Date:From:To:Subject; b=FQ/pjfgy6MePmg0IvVS2Iw0I7pc/cwzNcwrEVH+Sb21FdADXSSdeeu5emYKLrOugh G+UbY0Jr5KdlPxABGg4ijcl2taSUrn6OBWcsmP+dnvbUVVVu0IDryZPeE+EQkcI5Wq JbtynYPaCmQHnNLotBv6Kvb4Zmdjk9rtD7l4AVOo= Received: from motsugo.ucc.gu.uwa.edu.au (motsugo.ucc.gu.uwa.edu.au [130.95.13.7]) by mooneye.ucc.gu.uwa.edu.au (Postfix) with ESMTP id 266AF3C054 for ; Fri, 17 Jun 2016 11:53:19 +0800 (AWST) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ucc.gu.uwa.edu.au; s=ucc-2016-3; t=1466135599; bh=HDmi0WXGrRaTu1ndD7mBJW9u7NfO5rbcYGlwGKTpaFY=; h=Date:From:To:Subject; b=FQ/pjfgy6MePmg0IvVS2Iw0I7pc/cwzNcwrEVH+Sb21FdADXSSdeeu5emYKLrOugh G+UbY0Jr5KdlPxABGg4ijcl2taSUrn6OBWcsmP+dnvbUVVVu0IDryZPeE+EQkcI5Wq JbtynYPaCmQHnNLotBv6Kvb4Zmdjk9rtD7l4AVOo= Received: by motsugo.ucc.gu.uwa.edu.au (Postfix, from userid 11251) id 1E5CE2001D; Fri, 17 Jun 2016 11:53:19 +0800 (AWST) Received: from localhost (localhost [127.0.0.1]) by motsugo.ucc.gu.uwa.edu.au (Postfix) with ESMTP id 17C3220019 for ; Fri, 17 Jun 2016 11:53:19 +0800 (AWST) Date: Fri, 17 Jun 2016 11:53:19 +0800 (AWST) From: David Adam To: freebsd-fs@freebsd.org Subject: Processes wedging on ZFS accesses Message-ID: User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2016 03:54:34 -0000 Hi all, We're still having trouble with our 10.3-RELEASE-p3 fileserver using a ZFS pool. After a certain amount of uptime (usually a week or so), a Samba process will get stuck in D-state: max 2075 0.0 0.2 339928 26616 - D 26May16 0:19.59 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf Running find(1) over the hierarchy that the smbd process has open will also wedge in a D-state. Our backups also seem to get stuck, presumably in the same spot. `procstat -k` on the stuck processes (smbd and our stuck python-based backup program) shows: PID TID COMM TDNAME KSTACK 2075 100587 smbd - mi_switch+0xe1 sleepq_wait+0x3a _sx_slock_hard+0x31b namei+0x1c5 vn_open_cred+0x24d zfs_getextattr+0x1f2 VOP_GETEXTATTR_APV+0xa7 extattr_get_vp+0x15d sys_extattr_get_file+0xf4 amd64_syscall+0x40f Xfast_syscall+0xfb 2075 100623 smbd - mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0xca0 vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 knlist_remove_kq+0x24 filt_vfsdetach+0x22 knote_fdclose+0xef closefp+0x42 amd64_syscall+0x40f Xfast_syscall+0xfb 21676 101572 python2.7 - mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0x91a vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 vget+0x73 cache_lookup+0x5d5 vfs_cache_lookup+0xac VOP_LOOKUP_APV+0xa1 lookup+0x5a1 namei+0x4d4 kern_statat_vnhook+0xae sys_lstat+0x30 amd64_syscall+0x40f Xfast_syscall+0xfb 36144 101585 python2.7 - mi_switch+0xe1 sleepq_wait+0x3a sleeplk+0x15d __lockmgr_args+0x91a vop_stdlock+0x3c VOP_LOCK1_APV+0xab _vn_lock+0x43 vget+0x73 cache_lookup+0x5d5 vfs_cache_lookup+0xac VOP_LOOKUP_APV+0xa1 lookup+0x5a1 namei+0x4d4 kern_statat_vnhook+0xae sys_lstat+0x30 amd64_syscall+0x40f Xfast_syscall+0xfb Memory doesn't appear to be a problem, and we have the ARC wired to 10 GB maximum: Mem: 80M Active, 2127M Inact, 13G Wired, 36K Cache, 1643M Buf, 308M Free ARC: 9921M Total, 7056M MFU, 2434M MRU, 977K Anon, 162M Header, 267M Other Swap: 20G Total, 20G Free I'm getting a DDB kernel built, but is there any other information that is useful? One of the problems is that Samba won't start a new process for the user whose process is already wedged, so eventually user logins stop working. Thanks David Adam zanchey@ucc.gu.uwa.edu.au