From owner-freebsd-performance@FreeBSD.ORG Mon Oct 25 13:56:04 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E105916A4CE; Mon, 25 Oct 2004 13:56:03 +0000 (GMT) Received: from note.orchestra.cse.unsw.EDU.AU (note.orchestra.cse.unsw.EDU.AU [129.94.242.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id C5CE743D2D; Mon, 25 Oct 2004 13:56:02 +0000 (GMT) (envelope-from lukem@cse.unsw.edu.au) Received: From wagner With LocalMail ; Mon, 25 Oct 2004 23:55:59 +1000 From: lukem.freebsd@cse.unsw.edu.au Sender: lukem@cse.unsw.edu.au To: Robert Watson Date: Mon, 25 Oct 2004 23:55:59 +1000 (EST) In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed cc: freebsd-performance@freebsd.org Subject: Re: CPU utilisation cap? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Oct 2004 13:56:04 -0000 On Mon, 25 Oct 2004, Robert Watson wrote: > A couple of thoughts, none of which points at any particular red flag, but > worth thinking about: > > - You indicate their are multiple if_em cards in the host -- can you > describe the network topology? Are you using multiple cards, or just > one of the nicely equipped ones? Is there a switch involved, or direct > back-to-back wires? I have 4 linux boxes (don't blame me!) generating udp traffic through an 8-port HP procurve gigabit switch. The software I am using is ipbench (see ipbench.sourceforge.net for details). I am presently only using one NIC at a time, though I intend to measure routing performance soon which will eliminate the user component (and hopefully some possible scheduler effects), and might shed more light on things. > - Are the packet sources generating the packets synchronously or > asynchronously: i.e., when a packet source sends a UDP packet, does it > wait for the response before continuing, or keep on sending? If > synchronously, are you sure that the wires are being kept busy? It is not a ping-pong benchmark. The traffic is generated continuously, each packet is timestamped as it is sent, and the timestamp is compared with the receipt time to get a round trip time (I didn't bother including the latency information in my pust to freebsd-net). > - Make sure your math on PCI bus bandwidth accounts for packets going in > both directions if you're actually echoing the packets. Also make sure > to include the size of the ethernet frame and any other headers. The values I have quoted (550Mbit etc.) are throughput for what is received at the linux box after echoing. Therefore we can expect double that on the PCI bus, plus overheads. > - If you're using SCHED_ULE, be aware that it's notion of "nice" is a > little different from the traditional UNIX notion, and attempts to > provide more proportional CPU allocation. You might try switching to > SCHED_4BSD. Note that there have been pretty large scheduler changes in > 5.3, with a number of the features that were previously specific to > SCHED_ULE being made available with SCHED_4BSD, and that a lot of > scheduling bugs have been fixed. If you move to 5.3, make sure you run > with 4BSD, and it would be worth trying it with 5.2 to "see what > happens". I very strongly agree that it sounds like a scheduling effect. The 5.2.1 kernel (which is what I am using) is built with SCHED_4BSD already. It will be interesting to see if the new scheduler makes a difference. > - It would be worth trying the test without the soaker process but instead > a sampling process that polls the kernel's notion of CPU% measurement > every second. That way if it does turn out that ULE is unecessarily > giving CPU cycles to the soaker, you can still measure w/o "soaking". > > - What does your soaker do -- in particular, does it make system calls to > determine the time frequently? If so, the synchronization operations > and scheduling cost associated with that may impact your measurements. > If it just spins reading the tsc and outputting once in a while, you > should be OK WRT this point. I will look into this. I didn't write the code so I'm not sure what it does exactly. From what I understand it uses a calibrated tight loop, so it shouldn't need to do any syscalls while it is running, but I will check it out anyway. I have been considering implementing this using a cycle count register, but have avoided it for portability so far. > - Could you confirm using netstat -s statistics that a lot of your packets > aren't getting dropped due to full buffers on either send or receive. > Also, do you have any tests in place to measure packet loss? Can you > confirm that all the packets you send from the Linux boxes are really > sent, and that given they are sent, that they arrive, and vice versa on > the echo? Adding sequence numbers and measuring the mean sequence > number difference might be an easy way to start if you aren't already. I get numbers for both the packets transmitted and the packets received (albeit in terms of throughputs). What I see is little-to-no packet loss below the MLFRR (maximum loss free receive rate), and obviously packets get lost after that. Thanks for the help! -- Luke