Date: Thu, 24 Jun 2004 23:35:40 -0400 (EDT) From: Robert Watson <rwatson@freebsd.org> To: Danny Braniss <danny@cs.huji.ac.il> Cc: freebsd-hackers@freebsd.org Subject: Re: waiting on sbwait Message-ID: <Pine.NEB.3.96L.1040624232902.26400C-100000@fledge.watson.org> In-Reply-To: <20040623163251.71F4A43D2F@mx1.FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 23 Jun 2004, Danny Braniss wrote: > sometimes we get > load: 0.04 cmd: dmesg 13453 [nfsrcvlk] 0.00u 0.00s 0% 148k > > and looking through the code, there might be some connection between > sbwait and nfsrcvlk, but i doubt that it's sockets that im running out > off, neither mbufs, since: > > foundation> netstat -m > 326/1216/26624 mbufs in use (current/peak/max): > 326 mbufs allocated to data > 321/428/6656 mbuf clusters in use (current/peak/max) > 1160 Kbytes allocated to network (5% of mb_map in use) > 0 requests for memory denied > 0 requests for memory delayed > 0 calls to protocol drain routines > > also, the process enters sbwait either in sosend or soreceive, make me > believe that it's some resource, rather than data, that is missing. > > the fact that this 'unresponsivness' happens sometimes is making this > rather challenging, but try to tell this to the users :-) sbwait() occurs when a thread is blocked in a socket waiting for space in the socket to send, or for data in the socket on a receive. This can happen either because a process is directly performing socket I/O -- for example, sending or receiving on a TCP or UDP socket -- or, it can happen when a process is using a facility that performs socket I/O in its kernel thread. For example, the NFS client. So the sbwait state could be a result of filled buffers of NFS. If I had to guess, it might well be NFS. However, there are actually ways to tell :-). The easiest is to compile your kernel with DDB, and when a process hangs with those symptoms, break into the debugger and do a trace on its pid. You'll get back a stack trace. If it's using a send/recv system call that terminates in the socket code without hitting VFS/NFS, it's blocked on network I/O, perhaps because it's sending or receiving a lot of data and hasn't finished. If you see it pass through NFS-related functions, then it's waiting for NFS network I/O, which could reflect a busy NFS server, network segment, packet loss, etc. Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Senior Research Scientist, McAfee Research
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1040624232902.26400C-100000>