From owner-freebsd-fs@FreeBSD.ORG Tue Jan 19 08:03:02 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9AAF1065693; Tue, 19 Jan 2010 08:03:02 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f209.google.com (mail-bw0-f209.google.com [209.85.218.209]) by mx1.freebsd.org (Postfix) with ESMTP id 077588FC14; Tue, 19 Jan 2010 08:03:01 +0000 (UTC) Received: by bwz1 with SMTP id 1so1292638bwz.13 for ; Tue, 19 Jan 2010 00:03:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:to:cc:subject:references :organization:from:date:in-reply-to:message-id:user-agent :mime-version:content-type:content-transfer-encoding; bh=D7b1832tx6M4Fywxyifrqf+sABYej7JZs9JCYUNZnVk=; b=hFIcAapmcxMF6lGqBTXdoK1jvhptwpw24G+1A8gTCUbJVobbm5VGahKjr5Xolq3cbu 0L/TLgHrQUf9/T11gpYlt66qeIToYqt+D789KfYEG2BTWD2Tikes3KENVdQR0qeOb7to erFcKjd/fHlLTLEpolv7LZuS6T48tN5u8dpgs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=to:cc:subject:references:organization:from:date:in-reply-to :message-id:user-agent:mime-version:content-type :content-transfer-encoding; b=P0kBO18/bYHo8XUVNNXqQoOEMoWoxo7lcqr0ai2N5YmNzp5/NZEYuamJc3ji168cq/ gVo2K5K6Hh1eptRTGi2YiV0xK12LYa8dzeEa/n5RnZytMeTYA0ufBGk6Zq2YDPmIue+d oTIBPwzOeNjJ4FqgVjNcCAk9v+Xt5JpJnF0Js= Received: by 10.204.32.72 with SMTP id b8mr2583084bkd.203.1263888180556; Tue, 19 Jan 2010 00:03:00 -0800 (PST) Received: from localhost (ms.singlescrowd.net [80.85.90.67]) by mx.google.com with ESMTPS id 16sm883583bwz.11.2010.01.19.00.02.59 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 19 Jan 2010 00:02:59 -0800 (PST) To: freebsd-fs@FreeBSD.org References: <86ocl272mb.fsf@kopusha.onet> <86tyuqnz9x.fsf@zhuzha.ua1> <86zl4awmon.fsf@zhuzha.ua1> Organization: TOA Ukraine From: Mikolaj Golub Date: Tue, 19 Jan 2010 10:02:57 +0200 In-Reply-To: <86zl4awmon.fsf@zhuzha.ua1> (Mikolaj Golub's message of "Tue\, 19 Jan 2010 09\:58\:32 +0200") Message-ID: <86vdeywmha.fsf@zhuzha.ua1> User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Cc: freebsd-stable@FreeBSD.org Subject: Re: FreeBSD NFS client/Linux NFS server issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jan 2010 08:03:02 -0000 Hi, In this thread I have posted to freebsd-fs@ several messages describing our problem with freebsd7.1 nfs clients. As with the time new info has appeared and having this all spread in several messages might be a bit confusing, I want to summarise here what we see and know. Also I cc to freebsd-stable@ hoping to draw more attention to this problem as it looks for me very interesting and challenging :-) I have found in the Internet that other people have been observed the similar problem with FreeBSD6.2 client: http://forums.freebsd.org/showthread.php?t=1697 So, on some of our freebsd7.1 nfs clients (and it looks like we have had similar case with 6.3), which have several nfs mounts to the same CentOS 5.3 NFS server (mount options: rw,-3,-T,-s,-i,-r=32768,-w=32768,-o=noinet6), at some moment the access to one of the NFS mount gets stuck, while the access to the other mounts works ok. In all cases we have been observed so far the first gotten stuck process was php script (or two) that was (were) writing to logs file (appending). In tcpdump we see that every write to the file causes the sequence of the following rpc: ACCESS - READ - WRITE - COMMIT. And at some moment this stops after READ rpc call and successful reply. After this in tcpdump successful readdir/access/lookup/fstat calls are observed from our other utilities, which just check the presence of some files and they work ok (df also works). The php process at this state is in bo_wwait invalidating buffer cache [1]. If at this time we try accessing the share with mc then it hangs acquiring the vn_lock held by php process [2] and after this any operations with this NFS share hang (df hangs too). If instead some other process is started that writes to some other file on this share (append) then the first process "unfreezes" too (starting from WRITE rpc, so there is no any retransmits). With my limited knowledge of this complicated kernel subsystem I have the following hypothesis what is going on. On some of the nfs_write() it does successful ACCESS - READ rpcs but by some reason does not call WRITE to flush dirty buffer to the server (aborts somewere or may be in bdwrite() which calls bd_wakeup() and actually bd_wakeup considers that we don't have enough dirty buffers?). But it looks like on this stage the buffer appears to be unlinked from bufqueues [3] so when bufdaemon runs it does not flush the buffer. The next write() call to this file causes the process to get stuck invalidating the dirty buffer. The buffer is accessible by nfsiod via nmp structure [3] and when the next process is writing to another file, nfsiod is started and flushes this dirty buffer. [1]: Gotten stuck php process: (kgdb) bt #0 sched_switch (td=0xc839e000, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07cabe6 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:440 #2 0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc07f460c in sleepq_catch_signals (wchan=0xc90c9ee8) at /usr/src/sys/kern/subr_sleepqueue.c:417 #4 0xc07f4ebd in sleepq_wait_sig (wchan=0xc90c9ee8) at /usr/src/sys/kern/subr_sleepqueue.c:594 #5 0xc07cb047 in _sleep (ident=0xc90c9ee8, lock=0xc90c9e8c, priority=333, wmesg=0xc0b731ed "bo_wwait", timo=0) at /usr/src/sys/kern/kern_synch.c:224 #6 0xc0827295 in bufobj_wwait (bo=0xc90c9ec4, slpflag=256, timeo=0) at /usr/src/sys/kern/vfs_bio.c:3870 #7 0xc0966307 in nfs_flush (vp=0xc90c9e04, waitfor=1, td=0xc839e000, commit=1) at /usr/src/sys/nfsclient/nfs_vnops.c:2989 #8 0xc09667ce in nfs_fsync (ap=0xed3c38ec) at /usr/src/sys/nfsclient/nfs_vnops.c:2725 #9 0xc0aee5d2 in VOP_FSYNC_APV (vop=0xc0c2b920, a=0xed3c38ec) at vnode_if.c:1007 #10 0xc0827864 in bufsync (bo=0xc90c9ec4, waitfor=1, td=0xc839e000) at vnode_if.h:538 #11 0xc083f354 in bufobj_invalbuf (bo=0xc90c9ec4, flags=1, td=0xc839e000, slpflag=256, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:1066 #12 0xc083f6e2 in vinvalbuf (vp=0xc90c9e04, flags=1, td=0xc839e000, slpflag=256, slptimeo=0) at /usr/src/sys/kern/vfs_subr.c:1142 #13 0xc094f216 in nfs_vinvalbuf (vp=0xc90c9e04, flags=Variable "flags" is not available. ) at /usr/src/sys/nfsclient/nfs_bio.c:1326 #14 0xc0951825 in nfs_write (ap=0xed3c3bc4) at /usr/src/sys/nfsclient/nfs_bio.c:918 #15 0xc0aef956 in VOP_WRITE_APV (vop=0xc0c2b920, a=0xed3c3bc4) at vnode_if.c:691 #16 0xc0850097 in vn_write (fp=0xc9969b48, uio=0xed3c3c60, active_cred=0xcb901600, flags=0, td=0xc839e000) at vnode_if.h:373 #17 0xc07f9d17 in dofilewrite (td=0xc839e000, fd=6, fp=0xc9969b48, auio=0xed3c3c60, offset=-1, flags=0) at file.h:256 #18 0xc07f9ff8 in kern_writev (td=0xc839e000, fd=6, auio=0xed3c3c60) at /usr/src/sys/kern/sys_generic.c:401 #19 0xc07fa06f in write (td=0xc839e000, uap=0xed3c3cfc) at /usr/src/sys/kern/sys_generic.c:317 #20 0xc0ad9c75 in syscall (frame=0xed3c3d38) at /usr/src/sys/i386/i386/trap.c:1090 #21 0xc0ac01b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #22 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) [2] mc process gotten stuck acquiring _vn_lock held by above php thread: at /usr/src/sys/kern/sched_ule.c:1944 * 178 Thread 100340 (PID=40443: mc) sched_switch (td=0xc9810af0, newtd=Variable "newtd" is not availabl . (kgdb) thr 178 [Switching to thread 178 (Thread 100340)]#0 sched_switch (td=0xc9810af0, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 1944 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xc9810af0, newtd=Variable "newtd" is not available. ) at /usr/src/sys/kern/sched_ule.c:1944 #1 0xc07cabe6 in mi_switch (flags=Variable "flags" is not available. ) at /usr/src/sys/kern/kern_synch.c:440 #2 0xc07f42fb in sleepq_switch (wchan=Variable "wchan" is not available. ) at /usr/src/sys/kern/subr_sleepqueue.c:497 #3 0xc07f4946 in sleepq_wait (wchan=0xc90c9e5c) at /usr/src/sys/kern/subr_sleepqueue.c:580 #4 0xc07cb056 in _sleep (ident=0xc90c9e5c, lock=0xc0c77d18, priority=80, wmesg=0xc0b80b92 "nfs", timo=0) at /usr/src/sys/kern/kern_synch.c:226 #5 0xc07adf5a in acquire (lkpp=0xed56b7f0, extflags=Variable "extflags" is not available. ) at /usr/src/sys/kern/kern_lock.c:151 #6 0xc07ae84c in _lockmgr (lkp=0xc90c9e5c, flags=8194, interlkp=0xc90c9e8c, td=0xc9810af0, file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at /usr/src/sys/kern/kern_lock.c:384 #7 0xc0832470 in vop_stdlock (ap=0xed56b840) at /usr/src/sys/kern/vfs_default.c:305 #8 0xc0aef4f6 in VOP_LOCK1_APV (vop=0xc0c1d5c0, a=0xed56b840) at vnode_if.c:1618 #9 0xc084ed86 in _vn_lock (vp=0xc90c9e04, flags=8194, td=0xc9810af0, file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at vnode_if.h:851 #10 0xc0841d84 in vget (vp=0xc90c9e04, flags=8194, td=0xc9810af0) at /usr/src/sys/kern/vfs_subr.c:2061 #11 0xc08355b3 in vfs_hash_get (mp=0xc6b472cc, hash=3326873010, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/vfs_hash.c:81 #12 0xc09534d4 in nfs_nget (mntp=0xc6b472cc, fhp=0xc97be078, fhsize=20, npp=0xed56b9f0, flags=2) at /usr/src/sys/nfsclient/nfs_node.c:120 #13 0xc0964a05 in nfs_lookup (ap=0xed56ba84) at /usr/src/sys/nfsclient/nfs_vnops.c:947 #14 0xc0aefbe6 in VOP_LOOKUP_APV (vop=0xc0c2b920, a=0xed56ba84) at vnode_if.c:99 #15 0xc0836841 in lookup (ndp=0xed56bb48) at vnode_if.h:57 #16 0xc083756f in namei (ndp=0xed56bb48) at /usr/src/sys/kern/vfs_lookup.c:219 #17 0xc0844fef in kern_lstat (td=0xc9810af0, path=0x48611280
, pathseg=UIO_USERSPACE, sbp=0xed56bc18) at /usr/src/sys/kern/vfs_syscalls.c:2169 #18 0xc08451af in lstat (td=0xc9810af0, uap=0xed56bcfc) at /usr/src/sys/kern/vfs_syscalls.c:2152 #19 0xc0ad9c75 in syscall (frame=0xed56bd38) at /usr/src/sys/i386/i386/trap.c:1090 #20 0xc0ac01b0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255 #21 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) fr 6 #6 0xc07ae84c in _lockmgr (lkp=0xc90c9e5c, flags=8194, interlkp=0xc90c9e8c, td=0xc9810af0, file=0xc0b74aeb "/usr/src/sys/kern/vfs_subr.c", line=2061) at /usr/src/sys/kern/kern_lock.c:384 384 error = acquire(&lkp, extflags, (LK_HAVE_EXCL | LK_WANT_EXCL), &contested, &waitstart); (kgdb) p *lkp $2 = {lk_object = {lo_name = 0xc0b80b92 "nfs", lo_type = 0xc0b80b92 "nfs", lo_flags = 70844416, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c77d18, lk_flags = 33816640, lk_sharecount = 0, lk_waitcount = 1, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 51, lk_lockholder = 0xc839e000, lk_newlock = 0x0} [3] struct nfsmount of the "problem" share: (kgdb) p *nmp $4 = {nm_mtx = {lock_object = {lo_name = 0xc0b808ee "NFSmount lock", lo_type = 0xc0b808ee "NFSmount lock", lo_flags = 16973824, lo_witness_data = {lod_list = { stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, nm_flag = 35399, nm_state = 1310720, nm_mountp = 0xc6b472cc, nm_numgrps = 16, nm_fh = "\001\000\000\000\000\223\000\000\001@\003\n", '\0' , nm_fhsize = 12, nm_rpcclnt = {rc_flag = 0, rc_wsize = 0, rc_rsize = 0, rc_name = 0x0, rc_so = 0x0, rc_sotype = 0, rc_soproto = 0, rc_soflags = 0, rc_timeo = 0, rc_retry = 0, rc_srtt = {0, 0, 0, 0}, rc_sdrtt = {0, 0, 0, 0}, rc_sent = 0, rc_cwnd = 0, rc_timeouts = 0, rc_deadthresh = 0, rc_authtype = 0, rc_auth = 0x0, rc_prog = 0x0, rc_proctlen = 0, rc_proct = 0x0}, nm_so = 0xc6e81d00, nm_sotype = 1, nm_soproto = 0, nm_soflags = 44, nm_nam = 0xc6948640, nm_timeo = 6000, nm_retry = 2, nm_srtt = {15, 15, 31, 52}, nm_sdrtt = {3, 3, 15, 15}, nm_sent = 0, nm_cwnd = 4096, nm_timeouts = 0, nm_deadthresh = 9, nm_rsize = 32768, nm_wsize = 32768, nm_readdirsize = 4096, nm_readahead = 1, nm_wcommitsize = 1177026, nm_acdirmin = 30, nm_acdirmax = 60, nm_acregmin = 3, nm_acregmax = 60, nm_verf = "Jë¾W\000\004oí", nm_bufq = {tqh_first = 0xda82dc70, tqh_last = 0xda8058e0}, nm_bufqlen = 2, nm_bufqwant = 0, nm_bufqiods = 1, nm_maxfilesize = 1099511627775, nm_rpcops = 0xc0c2b5bc, nm_tprintf_initial_delay = 12, nm_tprintf_delay = 30, nm_nfstcpstate = { rpcresid = 0, flags = 1, sock_send_inprog = 0}, nm_hostname = "172.30.10.92\000/var/www/app31", '\0' , nm_clientid = 0, nm_fsid = { val = {0, 0}}, nm_lease_time = 0, nm_last_renewal = 0} buffers on it: (kgdb) p *nmp->nm_bufq.tqh_first $7 = {b_bufobj = 0xc7324960, b_bcount = 31565, b_caller1 = 0x0, b_data = 0xde581000 " valid_lines:", ' ' , "1341\n invalid_lines:", ' ' , "1556\n total_lines:", ' ' , "2897\n\n Error summary:\n Inactive pr"..., b_error = 0, b_iocmd = 2 '\002', b_ioflags = 0 '\0', b_iooffset = 196608, b_resid = 0, b_iodone = 0, b_blkno = 384, b_offset = 196608, b_bobufs = {tqe_next = 0x0, tqe_prev = 0xc7324964}, b_left = 0x0, b_right = 0x0, b_vflags = 0, b_freelist = { tqe_next = 0xda805894, tqe_prev = 0xc725d3c0}, b_qindex = 0, b_flags = 536870948, b_xflags = 2 '\002', b_lock = {lk_object = {lo_name = 0xc0b73635 "bufwait", lo_type = 0xc0b73635 "bufwait", lo_flags = 70844416, lo_witness_data = {lod_list = { stqe_next = 0x0}, lod_witness = 0x0}}, lk_interlock = 0xc0c77b50, lk_flags = 262144, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 80, lk_timo = 0, lk_lockholder = 0xfffffffe, lk_newlock = 0x0}, b_bufsize = 31744, b_runningbufspace = 0, b_kvabase = 0xde581000 " valid_lines:", ' ' , "1341\n invalid_lines:", ' ' , "1556\n total_lines:", ' ' , "2897\n\n Error summary:\n Inactive pr"..., b_kvasize = 32768, b_lblkno = 6, b_vp = 0xc73248a0, b_dirtyoff = 31512, b_dirtyend = 31565, b_rcred = 0x0, b_wcred = 0xcebec400, b_saveaddr = 0xde581000, b_pager = { pg_reqpage = 0}, b_cluster = {cluster_head = {tqh_first = 0xda917ec8, tqh_last = 0xda888e94}, cluster_entry = {tqe_next = 0xda917ec8, tqe_prev = 0xda888e94}}, b_pages = {0xc3726e90, 0xc448dca8, 0xc2a55b98, 0xc3bf1a28, 0xc3467ff0, 0xc3299600, 0xc28db130, 0xc2301398, 0x0 }, b_npages = 8, b_dep = {lh_first = 0x0}, b_fsprivate1 = 0x0, b_fsprivate2 = 0x0, b_fsprivate3 = 0x0, b_pin_count = 0} These are entires from our log file. Note that b_qindex is 0. But bufqueues[0] is empty: (kgdb) p bufqueues[0] $8 = {tqh_first = 0x0, tqh_last = 0xc0c83e20} Also does not it look strange that lk_lockholder of b_lock points to innvalid location (0xfffffffe)? -- Mikolaj Golub