From owner-freebsd-fs@FreeBSD.ORG Sun Jan 10 09:32:15 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 167161065695 for ; Sun, 10 Jan 2010 09:32:15 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-fx0-f227.google.com (mail-fx0-f227.google.com [209.85.220.227]) by mx1.freebsd.org (Postfix) with ESMTP id 73FEE8FC17 for ; Sun, 10 Jan 2010 09:32:14 +0000 (UTC) Received: by fxm27 with SMTP id 27so5844265fxm.3 for ; Sun, 10 Jan 2010 01:32:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:to:subject:organization:from :date:message-id:user-agent:mime-version:content-type :content-transfer-encoding; bh=HE2BbMiZpEGRQqVU0HGCzDe+MmX0xscxPP8DNbpDRCY=; b=IPwRgaeQh3+JvgAN4sWCM/g+0TasZrrITvg0qTvFaJdInZMcaThsGV4dJLzy1DQmgz 4qQ1Q/NozpYVovwtVgOlK+PDV2mQg6p6FSJkAziKTzwqETTJM1eObIsbfDP01lGconer 47haJ3iAP+gIvVosjxu82yrkHYfpleNZKBo7s= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=to:subject:organization:from:date:message-id:user-agent :mime-version:content-type:content-transfer-encoding; b=oYUX+ZM91E8Xb9R9VjErfN1VyISE+aboJRNXRDJhY0fmk8BayQUlsqRNpwAt7Ct4q3 28JCFV93LRLw8B4gUbZGZ0mLxQqU+aMPrZf85rQrLWcASQFgtvkpxP/fVD52HuVaQLVa g+rATRs3RogUA7v38DOei7rVMvV+FMlgR86cc= Received: by 10.223.74.144 with SMTP id u16mr6735627faj.21.1263114238381; Sun, 10 Jan 2010 01:03:58 -0800 (PST) Received: from localhost ([95.69.171.121]) by mx.google.com with ESMTPS id 16sm9040162fxm.8.2010.01.10.01.03.57 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 10 Jan 2010 01:03:57 -0800 (PST) To: freebsd-fs@FreeBSD.org Organization: TOA Ukraine From: Mikolaj Golub Date: Sun, 10 Jan 2010 11:03:56 +0200 Message-ID: <86ocl272mb.fsf@kopusha.onet> User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Cc: Subject: FreeBSD NFS client/Linux NFS server issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Jan 2010 09:32:15 -0000 Hi, I wrote about this problem some times ago to freebsd-stable http://lists.freebsd.org/pipermail/freebsd-stable/2009-December/053746.html but there have not been any response there so far and also we have collected some additional information since then so I am trying again here :-) We have CentOS 5.3 NFS server and many FreeBSD clients. The problem have been observed since we started upgrade FreeBSD host from 6.3 to 7.1 so we thought this is 7.1 related bu recently we had the similar issue with 6.3 host and actually it looks like the problem is rather on NFS server side, but I would like to describe the problem here and ask some advise. So the "problem" FreeBSD clients have several NFS mounts from CentOS 5.3 server. Mount options: rw,-3,-T,-s,-i,-r=32768,-w=32768,-o=noinet6 At some point on one of the clients some of the mounts might get stuck -- we observe one or two our php applications hanged on writing to NFS folder in nfs_flush()->bufobj_wwait() and all other processes that try to access this NFS mount hang acquiring nfs vn_lock hold by one of the above php processes. For one of the incident we were tcpdumping "problem" NFS connection for about 1 hour and during this hour an activity was observed only once: 08:20:38.281422 IP (tos 0x0, ttl 64, id 56110, offset 0, flags [DF], proto TCP (6), length 140) 172.30.10.27.344496259 > 172.30.10.121.2049: 88 access fh[1:9300:10df8001] 003f 08:20:38.281554 IP (tos 0x0, ttl 64, id 26624, offset 0, flags [DF], proto TCP (6), length 52) 172.30.10.121.2049 > 172.30.10.27.971: ., cksum 0xca5e (correct), 89408667:89408667(0) ack 1517941890 win 46 The client sent rpc ACCESS request for root exported inode, received tcp ack response (so tcp connection was ok) but did not receive any RPC reply from the server. So it looks like the problem on NFS server side. But for me it looks a bit strange that freebsd client is sending rpc packets so rarely. Shouldn't it retransmit them more frequently? For another incident we monitored tcp connection for 4 minutes and did not see any packets then. Unfortunately we can't run tcpdumping long time as these are production servers and we need to reboot hosts to restore normal operations. When the client was in such state we made it crush dump so I have a core file for the investigation. Below is some relevant data from the crush dump. Could you please advise what other things would be interesting to look at in the core for better understanding what is going on? Backtraces of two writing php processes hanged in nfs_flush(): #0 sched_switch (td=0xc6befaf0, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07cabe6 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:440 #2 0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc07f460c in sleepq_catch_signals (wchan=0xc9cb21f8) at /usr/src/sys/kern/subr_sleepqueue.c:417 #4 0xc07f4ebd in sleepq_wait_sig (wchan=0xc9cb21f8) at /usr/src/sys/kern/subr_sleepqueue.c:594 #5 0xc07cb047 in _sleep (ident=0xc9cb21f8, lock=0xc9cb219c, priority=333, wmesg=0xc0b731ed "bo_wwait", timo=0) at /usr/src/sys/kern/kern_synch.c:224 #6 0xc0827295 in bufobj_wwait (bo=0xc9cb21d4, slpflag=256, timeo=0) at /usr/src/sys/kern/vfs_bio.c:3870 #7 0xc0966307 in nfs_flush (vp=0xc9cb2114, waitfor=1, td=0xc6befaf0, commit=1) at /usr/src/sys/nfsclient/nfs_vnops.c:2989 #8 0xc09667ce in nfs_fsync (ap=0xed2258ec) at /usr/src/sys/nfsclient/nfs_vnops.c:2725 #9 0xc0aee5d2 in VOP_FSYNC_APV (vop=0xc0c2b920, a=0xed2258ec) at vnode_if.c:1007 #10 0xc0827864 in bufsync (bo=0xc9cb21d4, waitfor=1, td=0xc6befaf0) at vnode_if.h:538 #11 0xc083f354 in bufobj_invalbuf (bo=0xc9cb21d4, flags=1, td=0xc6befaf0, slpflag=256, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:1066 #12 0xc083f6e2 in vinvalbuf (vp=0xc9cb2114, flags=1, td=0xc6befaf0, slpflag=256, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:1142 #13 0xc094f216 in nfs_vinvalbuf (vp=0xc9cb2114, flags=Variable "flags" is not available. ) at /usr/src/sys/nfsclient/nfs_bio.c:1326 #14 0xc0951825 in nfs_write (ap=0xed225bc4) at /usr/src/sys/nfsclient/nfs_bio.c:918 #15 0xc0aef956 in VOP_WRITE_APV (vop=0xc0c2b920, a=0xed225bc4) at vnode_if.c:691 #16 0xc0850097 in vn_write (fp=0xc902aafc, uio=0xed225c60, active_cred=0xcb27b900, flags=0, td=0xc6befaf0) at vnode_if.h:373 #17 0xc07f9d17 in dofilewrite (td=0xc6befaf0, fd=6, fp=0xc902aafc, auio=0xed225c60, offset=-1, flags=0) at file.h:256 #18 0xc07f9ff8 in kern_writev (td=0xc6befaf0, fd=6, auio=0xed225c60) at /usr/src/sys/kern/sys_generic.c:401 #19 0xc07fa06f in write (td=0xc6befaf0, uap=0xed225cfc) at /usr/src/sys/kern/sys_generic.c:317 #20 0xc0ad9c75 in syscall (frame=0xed225d38) at /usr/src/sys/i386/i386/trap.c:1090 #21 0xc0ac01b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #22 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) #0 sched_switch (td=0xc731c8c0, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07cabe6 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:440 #2 0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc07f460c in sleepq_catch_signals (wchan=0xcafa375c) at /usr/src/sys/kern/subr_sleepqueue.c:417 #4 0xc07f4ebd in sleepq_wait_sig (wchan=0xcafa375c) at /usr/src/sys/kern/subr_sleepqueue.c:594 #5 0xc07cb047 in _sleep (ident=0xcafa375c, lock=0xcafa3700, priority=333, wmesg=0xc0b731ed "bo_wwait", timo=0) at /usr/src/sys/kern/kern_synch.c:224 #6 0xc0827295 in bufobj_wwait (bo=0xcafa3738, slpflag=256, timeo=0) at /usr/src/sys/kern/vfs_bio.c:3870 #7 0xc0966307 in nfs_flush (vp=0xcafa3678, waitfor=1, td=0xc731c8c0, commit=1) at /usr/src/sys/nfsclient/nfs_vnops.c:2989 #8 0xc09667ce in nfs_fsync (ap=0xed2578ec) at /usr/src/sys/nfsclient/nfs_vnops.c:2725 #9 0xc0aee5d2 in VOP_FSYNC_APV (vop=0xc0c2b920, a=0xed2578ec) at vnode_if.c:1007 #10 0xc0827864 in bufsync (bo=0xcafa3738, waitfor=1, td=0xc731c8c0) at vnode_if.h:538 #11 0xc083f354 in bufobj_invalbuf (bo=0xcafa3738, flags=1, td=0xc731c8c0, slpflag=256, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:1066 #12 0xc083f6e2 in vinvalbuf (vp=0xcafa3678, flags=1, td=0xc731c8c0, slpflag=256, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:1142 #13 0xc094f216 in nfs_vinvalbuf (vp=0xcafa3678, flags=Variable "flags" is not available. ) at /usr/src/sys/nfsclient/nfs_bio.c:1326 #14 0xc0951825 in nfs_write (ap=0xed257bc4) at /usr/src/sys/nfsclient/nfs_bio.c:918 #15 0xc0aef956 in VOP_WRITE_APV (vop=0xc0c2b920, a=0xed257bc4) at vnode_if.c:691 #16 0xc0850097 in vn_write (fp=0xc8c232f8, uio=0xed257c60, active_cred=0xc89ee600, flags=0, td=0xc731c8c0) at vnode_if.h:373 #17 0xc07f9d17 in dofilewrite (td=0xc731c8c0, fd=6, fp=0xc8c232f8, auio=0xed257c60, offset=-1, flags=0) at file.h:256 #18 0xc07f9ff8 in kern_writev (td=0xc731c8c0, fd=6, auio=0xed257c60) at /usr/src/sys/kern/sys_generic.c:401 #19 0xc07fa06f in write (td=0xc731c8c0, uap=0xed257cfc) at /usr/src/sys/kern/sys_generic.c:317 #20 0xc0ad9c75 in syscall (frame=0xed257d38) at /usr/src/sys/i386/i386/trap.c:1090 #21 0xc0ac01b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #22 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) fr 6 #6 0xc0827295 in bufobj_wwait (bo=0xcafa3738, slpflag=256, timeo=0) at /usr/src/sys/kern/vfs_bio.c:3870 3870 error = msleep(&bo->bo_numoutput, BO_MTX(bo), (kgdb) list 3865 KASSERT(bo != NULL, ("NULL bo in bufobj_wwait")); 3866 ASSERT_BO_LOCKED(bo); 3867 error = 0; 3868 while (bo->bo_numoutput) { 3869 bo->bo_flag |= BO_WWAIT; 3870 error = msleep(&bo->bo_numoutput, BO_MTX(bo), 3871 slpflag | (PRIBIO + 1), "bo_wwait", timeo); 3872 if (error) 3873 break; 3874 } (kgdb) p *bo $1 = {bo_mtx = 0xcafa3700, bo_clean = {bv_hd = {tqh_first = 0xda869ec0, tqh_last = 0xda869ef8}, bv_root = 0xda869ec0, bv_cnt = 1}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xcafa374c}, bv_root = 0x0, bv_cnt = 0}, bo_numoutput = 1, bo_flag = 2, bo_ops = 0xc0c2bae0, bo_bsize = 32768, bo_object = 0xca637744, bo_synclist = {le_next = 0xcb64d510, le_prev = 0xca129aac}, bo_private = 0xcafa3678, __bo_vnode = 0xcafa3678} The stack of the some of the processes gotten stuck acquiring _vn_lock: #0 sched_switch (td=0xc8bab460, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07cabe6 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:440 #2 0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc07f4946 in sleepq_wait (wchan=0xcafa36d0) at /usr/src/sys/kern/subr_sleepqueue.c:580 #4 0xc07cb056 in _sleep (ident=0xcafa36d0, lock=0xc0c77880, priority=80, wmesg=0xc0b80b92 "nfs", timo=0) at /usr/src/sys/kern/kern_synch.c:226 #5 0xc07adf5a in acquire (lkpp=0xed5fd7f0, extflags=Variable "extflags" is not available. ) at /usr/src/sys/kern/kern_lock.c:151 #6 0xc07ae84c in _lockmgr (lkp=0xcafa36d0, flags=8194, interlkp=0xcafa3700, td=0xc8bab460, file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at /usr/src/sys/kern/kern_lock.c:384 #7 0xc0832470 in vop_stdlock (ap=0xed5fd840) at /usr/src/sys/kern/vfs_default.c:305 #8 0xc0aef4f6 in VOP_LOCK1_APV (vop=0xc0c1d5c0, a=0xed5fd840) at vnode_if.c:1618 #9 0xc084ed86 in _vn_lock (vp=0xcafa3678, flags=8194, td=0xc8bab460, file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at vnode_if.h:851 #10 0xc0841d84 in vget (vp=0xcafa3678, flags=8194, td=0xc8bab460) at /usr/src/sys/kern/vfs_subr.c:2061 #11 0xc08355b3 in vfs_hash_get (mp=0xc7232000, hash=3326873010, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/vfs_hash.c:81 #12 0xc09534d4 in nfs_nget (mntp=0xc7232000, fhp=0xcd5bd878, fhsize=20, npp=0xed5fd9f0, flags=2) at /usr/src/sys/nfsclient/nfs_node.c:120 #13 0xc0964a05 in nfs_lookup (ap=0xed5fda84) at /usr/src/sys/nfsclient/nfs_vnops.c:947 #14 0xc0aefbe6 in VOP_LOOKUP_APV (vop=0xc0c2b920, a=0xed5fda84) at vnode_if.c:99 #15 0xc0836841 in lookup (ndp=0xed5fdb48) at vnode_if.h:57 #16 0xc083756f in namei (ndp=0xed5fdb48) at /usr/src/sys/kern/vfs_lookup.c:219 #17 0xc0844fef in kern_lstat (td=0xc8bab460, path=0x486112b0
, pathseg=UIO_USERSPACE, sbp=0xed5fdc18) at /usr/src/sys/kern/vfs_syscalls.c:2169 #18 0xc08451af in lstat (td=0xc8bab460, uap=0xed5fdcfc) at /usr/src/sys/kern/vfs_syscalls.c:2152 #19 0xc0ad9c75 in syscall (frame=0xed5fdd38) at /usr/src/sys/i386/i386/trap.c:1090 #20 0xc0ac01b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #21 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) p *lkp $2 = {lk_object = {lo_name = 0xc0b80b92 "nfs", lo_type = 0xc0b80b92 "nfs", lo_flags = 70844416, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c77880, lk_flags = 33816640, lk_sharecount = 0, lk_waitcount = 1, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 51, lk_lockholder = 0xc731c8c0, lk_newlock = 0x0} Note lk_lockholder here is one of two php processes above. nfsmount structure of the "problem" mount: $4 = {nm_mtx = {lock_object = {lo_name = 0xc0b808ee "NFSmount lock", lo_type = 0xc0b808ee "NFSmount lock", lo_flags = 16973824, lo_witness_data = {lod_list = { stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, nm_flag = 35399, nm_state = 1310720, nm_mountp = 0xc7232000, nm_numgrps = 16, nm_fh = "\001\000\000\000\000\223\000\000\001@\003\n", '\0' , nm_fhsize = 12, nm_rpcclnt = {rc_flag = 0, rc_wsize = 0, rc_rsize = 0, rc_name = 0x0, rc_so = 0x0, rc_sotype = 0, rc_soproto = 0, rc_soflags = 0, rc_timeo = 0, rc_retry = 0, rc_srtt = {0, 0, 0, 0}, rc_sdrtt = {0, 0, 0, 0}, rc_sent = 0, rc_cwnd = 0, rc_timeouts = 0, rc_deadthresh = 0, rc_authtype = 0, rc_auth = 0x0, rc_prog = 0x0, rc_proctlen = 0, rc_proct = 0x0}, nm_so = 0xc722e340, nm_sotype = 1, nm_soproto = 0, nm_soflags = 44, nm_nam = 0xc68eca00, nm_timeo = 6000, nm_retry = 2, nm_srtt = {15, 15, 17, 16}, nm_sdrtt = {3, 3, 4, 3}, nm_sent = 0, nm_cwnd = 4096, nm_timeouts = 0, nm_deadthresh = 9, nm_rsize = 32768, nm_wsize = 32768, nm_readdirsize = 4096, nm_readahead = 1, nm_wcommitsize = 1177026, nm_acdirmin = 30, nm_acdirmax = 60, nm_acregmin = 3, nm_acregmax = 60, nm_verf = "Jë¾W\000\004oí", nm_bufq = {tqh_first = 0xda93cc68, tqh_last = 0xda869f0c}, nm_bufqlen = 2, nm_bufqwant = 0, nm_bufqiods = 1, nm_maxfilesize = 1099511627775, nm_rpcops = 0xc0c2b5bc, nm_tprintf_initial_delay = 12, nm_tprintf_delay = 30, nm_nfstcpstate = {rpcresid = 0, flags = 3, sock_send_inprog = 0}, nm_hostname = "172.30.10.92\000/var/www/app31", '\0' , nm_clientid = 0, nm_fsid = {val = {0, 0}}, nm_lease_time = 0, nm_last_renewal = 0} (kgdb) p *nmp->nm_bufq.tqh_first $9 = {b_bufobj = 0xc9cb21d4, b_bcount = 3545, b_caller1 = 0x0, b_data = 0xdd74d000 " 1641\n\n Warning summary:\n Empty Service Category:", ' ' , "10\n\nUpload finished at 2010-01-08 20:17:10 GMT; Run time is 8.5791680812836 sec.\n\n\n\nUpdate upload started at 2010-01-08"..., b_error = 0, b_iocmd = 2 '\002', b_ioflags = 0 '\0', b_iooffset = 163840, b_resid = 0, b_iodone = 0, b_blkno = 320, b_offset = 163840, b_bobufs = { tqe_next = 0x0, tqe_prev = 0xc9cb21d8}, b_left = 0x0, b_right = 0x0, b_vflags = 0, b_freelist = { tqe_next = 0xda869ec0, tqe_prev = 0xc722a3c0}, b_qindex = 0, b_flags = 536870948, b_xflags = 2 '\002', b_lock = {lk_object = {lo_name = 0xc0b73635 "bufwait", lo_type = 0xc0b73635 "bufwait", lo_flags = 70844416, lo_witness_data = {lod_list = { stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c77fa0, lk_flags = 262144, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 0, lk_lockholder = 0xfffffffe, lk_newlock = 0x0}, b_bufsize = 3584, b_runningbufspace = 0, b_kvabase = 0xdd74d000 " 1641\n\n Warning summary:\n Empty Service Category:", ' ' , "10\n\nUpload finished at 2010-01-08 20:17:10 GMT; Run time is 8.5791680812836 sec.\n\n\n\nUpdate upload started at 2010-01-08"..., b_kvasize = 32768, b_lblkno = 5, b_vp = 0xc9cb2114, b_dirtyoff = 3492, b_dirtyend = 3545, b_rcred = 0x0, b_wcred = 0xcb27b900, b_saveaddr = 0xdd74d000, b_pager = {pg_reqpage = 0}, b_cluster = {cluster_head = {tqh_first = 0xda8f2e60, tqh_last = 0xda95fc34}, cluster_entry = {tqe_next = 0xda8f2e60, tqe_prev = 0xda95fc34}}, b_pages = {0xc2e543a0, 0x0 }, b_npages = 1, b_dep = {lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0, b_fsprivate3 = 0x0, b_pin_count = 0} (some bytes of the log php process was trying to write on nfs share :-) When the incident occurs we usally have 1-3 nfsiod processes serving other "good" shares. In this particular case when crush dump was generated we had 1 nfsiod process: (kgdb) bt #0 sched_switch (td=0xc724cd20, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07cabe6 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:440 #2 0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc07f460c in sleepq_catch_signals (wchan=0xc0c89f00) at /usr/src/sys/kern/subr_sleepqueue.c:417 #4 0xc07f4db9 in sleepq_timedwait_sig (wchan=0xc0c89f00) at /usr/src/sys/kern/subr_sleepqueue.c:631 #5 0xc07cb021 in _sleep (ident=0xc0c89f00, lock=0xc0c89ed0, priority=348, wmesg=0xc0b69679 "-", timo=120000) at /usr/src/sys/kern/kern_synch.c:220 #6 0xc095b46e in nfssvc_iod (instance=0xc0c89880) at /usr/src/sys/nfsclient/nfs_nfsiod.c:244 #7 0xc079e6c9 in fork_exit (callout=0xc095b380 , arg=0xc0c89880, frame=0xed27fd38) at /usr/src/sys/kern/kern_fork.c:804 #8 0xc0ac01c0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:264 (kgdb) fr 6 #6 0xc095b46e in nfssvc_iod (instance=0xc0c89880) at /usr/src/sys/nfsclient/nfs_nfsiod.c:244 244 error = msleep(&nfs_iodwant[myiod], &nfs_iod_mtx, PWAIT | PCATCH, (kgdb) list 239 nfs_iodmount[myiod] = NULL; 240 /* 241 * Always keep at least nfs_iodmin kthreads. 242 */ 243 timo = (myiod < nfs_iodmin) ? 0 : nfs_iodmaxidle * hz; 244 error = msleep(&nfs_iodwant[myiod], &nfs_iod_mtx, PWAIT | PCATCH, 245 "-", timo); 246 if (error) { 247 nmp = nfs_iodmount[myiod]; 248 /* (kgdb) p *nmp $5 = {nm_mtx = {lock_object = {lo_name = 0xc0b808ee "NFSmount lock", lo_type = 0xc0b808ee "NFSmount lock", lo_flags = 16973824, lo_witness_data = {lod_list = { stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, nm_flag = 35399, nm_state = 1310720, nm_mountp = 0xc6b46864, nm_numgrps = 16, nm_fh = "\001\000\000\000\000\223\000\000\001\200n\005", '\0' , nm_fhsize = 12, nm_rpcclnt = {rc_flag = 0, rc_wsize = 0, rc_rsize = 0, rc_name = 0x0, rc_so = 0x0, rc_sotype = 0, rc_soproto = 0, rc_soflags = 0, rc_timeo = 0, rc_retry = 0, rc_srtt = {0, 0, 0, 0}, rc_sdrtt = {0, 0, 0, 0}, rc_sent = 0, rc_cwnd = 0, rc_timeouts = 0, rc_deadthresh = 0, rc_authtype = 0, rc_auth = 0x0, rc_prog = 0x0, rc_proctlen = 0, rc_proct = 0x0}, nm_so = 0xc722e000, nm_sotype = 1, nm_soproto = 0, nm_soflags = 44, nm_nam = 0xc68da280, nm_timeo = 6000, nm_retry = 2, nm_srtt = {15, 15, 15, 33}, nm_sdrtt = {3, 3, 3, 21}, nm_sent = 0, nm_cwnd = 4360, nm_timeouts = 0, nm_deadthresh = 9, nm_rsize = 32768, nm_wsize = 32768, nm_readdirsize = 4096, nm_readahead = 1, nm_wcommitsize = 1177026, nm_acdirmin = 30, nm_acdirmax = 60, nm_acregmin = 3, nm_acregmax = 60, nm_verf = "Jë¾W\000\004oí", nm_bufq = {tqh_first = 0x0, tqh_last = 0xc722aa50}, nm_bufqlen = 0, nm_bufqwant = 0, nm_bufqiods = 1, nm_maxfilesize = 1099511627775, nm_rpcops = 0xc0c2b5bc, nm_tprintf_initial_delay = 12, nm_tprintf_delay = 30, nm_nfstcpstate = {rpcresid = 0, flags = 1, sock_send_inprog = 0}, nm_hostname = "172.30.10.75\000/var/www/app20", '\0' , nm_clientid = 0, nm_fsid = {val = {0, 0}}, nm_lease_time = 0, nm_last_renewal = 0} I have another crush dump and it looks similarly. Also could someone please explain me the following thing that is unclear for me due to the lack of necessary knowledge of freebsd internals. Here we have writing processes spinning in bufobj_wwait. As I understand in this state they are just waiting checking bo->bo_numoutput until some other kernel thread finishes I/O on the buffer and clear bo->bo_numoutput. But what is this thread that should do actual IO? I can't find nothing from running threads in back trace that would do this. nfsiod process? Then how is it triggered? Understanding how the things go on would be very helpful for me :-) It would be nice to ensure that the problem is not in a client to concentrate the investigation on the server side. Thanks in advance. -- Mikolaj Golub