Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Apr 2001 11:03:11 +0100
From:      Oliver Cook <ollie@uk.clara.net>
To:        freebsd-questions@FreeBSD.ORG
Subject:   Processes stuck in D (disk wait)
Message-ID:  <20010425110311.A37512@mutare.noc.clara.net>

next in thread | raw e-mail | index | archive | help
We run a number of webservers under various
versions on FreeBSD 3.x and 4.x, STABLE,
RELEASE and CURRENT all of which suffer from
the same problem, whilst running Apache.

The content they are serving comes off a
NetApp filer using NFS. After the box has
been up for about a month or so, processes
begin to get stuck in D, disk wait.

It is not possible to attach to the stuck
processes, but the following gdb backtrace
is interesting:

(kgdb) proc 58738
(kgdb) bt
#0  mi_switch () at ../../kern/kern_synch.c:859
#1  0xc01467e9 in tsleep (ident=0xe00a3aca, priority=18, wmesg=0xc024a79b "nfsvinval",
    timo=0) at ../../kern/kern_synch.c:468
#2  0xc01ad14f in nfs_vinvalbuf (vp=0xe0097b80, flags=1, cred=0xc63b1800, p=0xe1952920,
    intrflg=1) at ../../nfs/nfs_bio.c:1170
#3  0xc01d02a6 in nfs_open (ap=0xe195be10) at ../../nfs/nfs_vnops.c:506
#4  0xc01736af in vn_open (ndp=0xe195bedc, fmode=1, cmode=420) at vnode_if.h:189
#5  0xc016f6a1 in open (p=0xe1952920, uap=0xe195bf80) at ../../kern/vfs_syscalls.c:994
#6  0xc02238e6 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 4,
      tf_esi = 672559256, tf_ebp = -1077937648, tf_isp = -510279724, tf_ebx = 672502180,
      tf_edx = 672559256, tf_ecx = 15, tf_eax = 5, tf_trapno = 7, tf_err = 2,
      tf_eip = 672418516, tf_cs = 31, tf_eflags = 659, tf_esp = -1077937692, tf_ss = 47})
    at ../../i386/i386/trap.c:1073
#7  0xc0218be6 in Xint0x80_syscall ()
#8  0x8062fe0 in ?? ()
#9  0x806ccdd in ?? ()
#10 0x806618c in ?? ()
#11 0x80797f4 in ?? ()
#12 0x807985e in ?? ()
#13 0x8071027 in ?? ()
#14 0x80712ac in ?? ()
#15 0x807162c in ?? ()
#16 0x8071b41 in ?? ()
#17 0x8072144 in ?? ()
#18 0x804a159 in ?? ()

All the processes stuck in D are stuck
doing mi_switch.

Does this behaviour ring any bells with
anyone? Experience has shown us that
the only way to get rid of these stuck
processes is to reboot the box, which
is something we are usually loath to do
in a production environment.

The NetApps are mounted with the following
line in /etc/fstab:

000.0.00.000:/vol/vol0/web /web  nfs     rw              3       3

We have tried changing the read and write
block sizes on the NFS mount but this has
had limited effect, and in once case
actually made the situation worse!

The network connection to the NetApp filer
is healthy:

[/]# netstat -p udp
udp:
        38197798 datagrams received
        0 with incomplete header
        0 with bad data length field
        0 with bad checksum
        245 dropped due to no socket
        3124 broadcast/multicast datagrams dropped due to no socket
        44 dropped due to full socket buffers
        0 not for hashed pcb
        38194385 delivered
        38255984 datagrams output 

We are at a loss of what to look at
next for a possible cause to the
problem. 

Has anyone seen this kind of behaviour
before?

Yours.

Ollie
-- 
Oliver Cook    Systems Administrator, ClaraNET
ollie@uk.clara.net      020 7903 3000 ext. 291

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010425110311.A37512>