From owner-freebsd-stable@FreeBSD.ORG Wed Mar 22 01:09:23 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BD40816A400; Wed, 22 Mar 2006 01:09:23 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 800F643D49; Wed, 22 Mar 2006 01:09:23 +0000 (GMT) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.13.4/8.13.4) with ESMTP id k2M19Gbk007471; Tue, 21 Mar 2006 17:09:16 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.13.4/8.13.4/Submit) id k2M19GVS007470; Tue, 21 Mar 2006 17:09:16 -0800 (PST) Date: Tue, 21 Mar 2006 17:09:16 -0800 (PST) From: Matthew Dillon Message-Id: <200603220109.k2M19GVS007470@apollo.backplane.com> To: Mikhail Teterin References: <200603211607.30372.mi+mx@aldan.algebra.com> <200603211858.02801.mi+mx@aldan.algebra.com> <200603220025.k2M0PmCt007240@apollo.backplane.com> <200603211948.28178.mi+mx@aldan.algebra.com> Cc: alc@freebsd.org, stable@freebsd.org Subject: Re: more weird bugs with mmap-ing via NFS X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Mar 2006 01:09:23 -0000 :The file stops growing, but the network bandwidth remains at 20Mb/s. `Netstat :-s' on the client, had the following to say (udp and ip only): If the network bandwidth is still going full bore then the program is doing something. NFS retries would not account for it. A simple test for that would be to ^Z the program once it gets into this state and see if the network bandwidth goes to zero. So if we assume that packets aren't being lost, then the question becomes: what is the program doing that is causing the network bandwidth to go nuts? And if it isn't the program, then what is the OS doing that is causing the network bandwidth to go nuts? ktrace on the program would tell us if read() or write() or ftruncate() were causing an issue. 'vmstat 1' while the program is running would tell us if VM faults are creating an issue. If neither of those are an issue then I would guess that the problem could be related to the NFSv3 2-phase commit protocol. A way to test that would be to mount with NFSv2 and see if the problem still occurs. Running tcpdump on the network interface while the program is in this state might also give us some valuable clues. 50 lines of output from something like this after the program has gotten into its weird state might give us a clue: tcpdump -s 4096 -n -i -l port 2049 -Matt