Date: Wed, 25 Apr 2001 11:03:11 +0100 From: Oliver Cook <ollie@uk.clara.net> To: freebsd-questions@FreeBSD.ORG Subject: Processes stuck in D (disk wait) Message-ID: <20010425110311.A37512@mutare.noc.clara.net>
next in thread | raw e-mail | index | archive | help
We run a number of webservers under various versions on FreeBSD 3.x and 4.x, STABLE, RELEASE and CURRENT all of which suffer from the same problem, whilst running Apache. The content they are serving comes off a NetApp filer using NFS. After the box has been up for about a month or so, processes begin to get stuck in D, disk wait. It is not possible to attach to the stuck processes, but the following gdb backtrace is interesting: (kgdb) proc 58738 (kgdb) bt #0 mi_switch () at ../../kern/kern_synch.c:859 #1 0xc01467e9 in tsleep (ident=0xe00a3aca, priority=18, wmesg=0xc024a79b "nfsvinval", timo=0) at ../../kern/kern_synch.c:468 #2 0xc01ad14f in nfs_vinvalbuf (vp=0xe0097b80, flags=1, cred=0xc63b1800, p=0xe1952920, intrflg=1) at ../../nfs/nfs_bio.c:1170 #3 0xc01d02a6 in nfs_open (ap=0xe195be10) at ../../nfs/nfs_vnops.c:506 #4 0xc01736af in vn_open (ndp=0xe195bedc, fmode=1, cmode=420) at vnode_if.h:189 #5 0xc016f6a1 in open (p=0xe1952920, uap=0xe195bf80) at ../../kern/vfs_syscalls.c:994 #6 0xc02238e6 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 4, tf_esi = 672559256, tf_ebp = -1077937648, tf_isp = -510279724, tf_ebx = 672502180, tf_edx = 672559256, tf_ecx = 15, tf_eax = 5, tf_trapno = 7, tf_err = 2, tf_eip = 672418516, tf_cs = 31, tf_eflags = 659, tf_esp = -1077937692, tf_ss = 47}) at ../../i386/i386/trap.c:1073 #7 0xc0218be6 in Xint0x80_syscall () #8 0x8062fe0 in ?? () #9 0x806ccdd in ?? () #10 0x806618c in ?? () #11 0x80797f4 in ?? () #12 0x807985e in ?? () #13 0x8071027 in ?? () #14 0x80712ac in ?? () #15 0x807162c in ?? () #16 0x8071b41 in ?? () #17 0x8072144 in ?? () #18 0x804a159 in ?? () All the processes stuck in D are stuck doing mi_switch. Does this behaviour ring any bells with anyone? Experience has shown us that the only way to get rid of these stuck processes is to reboot the box, which is something we are usually loath to do in a production environment. The NetApps are mounted with the following line in /etc/fstab: 000.0.00.000:/vol/vol0/web /web nfs rw 3 3 We have tried changing the read and write block sizes on the NFS mount but this has had limited effect, and in once case actually made the situation worse! The network connection to the NetApp filer is healthy: [/]# netstat -p udp udp: 38197798 datagrams received 0 with incomplete header 0 with bad data length field 0 with bad checksum 245 dropped due to no socket 3124 broadcast/multicast datagrams dropped due to no socket 44 dropped due to full socket buffers 0 not for hashed pcb 38194385 delivered 38255984 datagrams output We are at a loss of what to look at next for a possible cause to the problem. Has anyone seen this kind of behaviour before? Yours. Ollie -- Oliver Cook Systems Administrator, ClaraNET ollie@uk.clara.net 020 7903 3000 ext. 291 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010425110311.A37512>