From owner-freebsd-questions@FreeBSD.ORG Fri Jan 23 08:42:02 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A46B416A4CE; Fri, 23 Jan 2004 08:42:02 -0800 (PST) Received: from mx0.dmpriest.net.uk (mx0.dmpriest.net.uk [62.13.128.30]) by mx1.FreeBSD.org (Postfix) with ESMTP id E16EB43D5A; Fri, 23 Jan 2004 08:41:47 -0800 (PST) (envelope-from kpielorz@tdx.co.uk) Received: from raptor (kpielorz.dmpriest.net.uk [62.13.130.13]) by mx0.dmpriest.net.uk (8.11.6/8.11.6/Kp) with ESMTP id i0NGfhX05528; Fri, 23 Jan 2004 16:41:44 GMT Date: Fri, 23 Jan 2004 16:41:50 +0000 From: Karl Pielorz To: Robert Watson Message-ID: <16051437.1074876110@raptor> In-Reply-To: References: X-Mailer: Mulberry/3.1.0 (Win32) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline cc: freebsd-questions@freebsd.org Subject: Re: FreeBSD tunnels / performance et'al (gif/tun etc.) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jan 2004 16:42:02 -0000 --On 23 January 2004 10:51 -0500 Robert Watson wrote: >> I'm just wondering if it was something 'weird' such as the delay over >> the tunnel being on average 'just the right delay time' to cause >> problems that you wouldn't get on a LAN or something? :) > > I agree that something sounds weird -- I've had no problem tunneling > hundreds of megabits using similar hardware to what you're using, and what > sounds like a similar configuration. So it seems like something is going > on. Do you have any load information available on the systems -- i.e., > interrupt rate as measured by vmstat, cpu usage, etc? Are you using natd > or other address space translation? Both systems are dedicated boxes, i.e. they run the tunnel - and nothing else (no nat, nothing). Load on each was unremarkable, i.e. no excessive interrupts etc. on the hardware that didn't work we were getting about 300 or so interrupts a second on each network card. After the changes this it rose to about 800 a second per card [as the tunnel performance rose]. We're due to pull the failed machine from the remote end soon - If I get a chance I'll run it up here - though I don't think it's "flakey hardware/network card" - as when scp/ftp'ing to that host via either it's physical address, or tunnel endpoint address we got good performance... Looking briefly at the tcpdumps - it looks like there were a lot of duplicated ACK packets being sent from the remote side (which would suggest they never made it to the other side) - and that would also be a credible reason for the sessions stalling so badly... It'd also explain why at the time the 'aggregate' traffic flow on gif0 looked good, but individual machines/IP's were getting really pityful throughput... I'll see if I can dig out the original tcpdumps [most the debug stuff usually starts disappearing once the problem is solved, regardless of how :(] -Karl