From owner-freebsd-hackers@FreeBSD.ORG Fri Jun 25 06:17:57 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EF7F116A4CE; Fri, 25 Jun 2004 06:17:57 +0000 (GMT) Received: from cs1.cs.huji.ac.il (cs1.cs.huji.ac.il [132.65.16.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9D05A43D5A; Fri, 25 Jun 2004 06:17:57 +0000 (GMT) (envelope-from danny@cs.huji.ac.il) Received: from pampa.cs.huji.ac.il ([132.65.80.32] ident=danny) by cs1.cs.huji.ac.il with esmtp id 1Bdk1F-0005Cg-J3; Fri, 25 Jun 2004 09:17:01 +0300 X-Mailer: exmh version 2.6.3 04/04/2003 with nmh-1.0.4 To: Robert Watson In-Reply-To: Message from Robert Watson Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 25 Jun 2004 09:17:01 +0300 From: Danny Braniss Message-Id: <20040625061757.9D05A43D5A@mx1.FreeBSD.org> cc: freebsd-hackers@freebsd.org Subject: Re: waiting on sbwait X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Jun 2004 06:17:58 -0000 > > On Wed, 23 Jun 2004, Danny Braniss wrote: > > > sometimes we get > > load: 0.04 cmd: dmesg 13453 [nfsrcvlk] 0.00u 0.00s 0% 148k > > > > and looking through the code, there might be some connection between > > sbwait and nfsrcvlk, but i doubt that it's sockets that im running out > > off, neither mbufs, since: > > > > foundation> netstat -m > > 326/1216/26624 mbufs in use (current/peak/max): > > 326 mbufs allocated to data > > 321/428/6656 mbuf clusters in use (current/peak/max) > > 1160 Kbytes allocated to network (5% of mb_map in use) > > 0 requests for memory denied > > 0 requests for memory delayed > > 0 calls to protocol drain routines > > > > also, the process enters sbwait either in sosend or soreceive, make me > > believe that it's some resource, rather than data, that is missing. > > > > the fact that this 'unresponsivness' happens sometimes is making this > > rather challenging, but try to tell this to the users :-) > > sbwait() occurs when a thread is blocked in a socket waiting for space in > the socket to send, or for data in the socket on a receive. This can > happen either because a process is directly performing socket I/O -- for > example, sending or receiving on a TCP or UDP socket -- or, it can happen > when a process is using a facility that performs socket I/O in its kernel > thread. For example, the NFS client. So the sbwait state could be a > result of filled buffers of NFS. If I had to guess, it might well be NFS. > However, there are actually ways to tell :-). > > The easiest is to compile your kernel with DDB, and when a process hangs > with those symptoms, break into the debugger and do a trace on its pid. > You'll get back a stack trace. If it's using a send/recv system call that > terminates in the socket code without hitting VFS/NFS, it's blocked on > network I/O, perhaps because it's sending or receiving a lot of data and > hasn't finished. If you see it pass through NFS-related functions, then > it's waiting for NFS network I/O, which could reflect a busy NFS server, > network segment, packet loss, etc. it's definetly NFS related, i/you can cause this to happen at will, ie: ls /net/host where host is down. the /net is a amd trigger which will try and mount via nfs all of host's exports. thanks, danny