From owner-freebsd-net@FreeBSD.ORG Wed Apr 11 13:00:03 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E8A311065670; Wed, 11 Apr 2012 13:00:02 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 981C98FC0C; Wed, 11 Apr 2012 13:00:02 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 5E1F17300A; Wed, 11 Apr 2012 15:19:20 +0200 (CEST) Date: Wed, 11 Apr 2012 15:19:20 +0200 From: Luigi Rizzo To: Andre Oppermann Message-ID: <20120411131920.GA61027@onelab2.iet.unipi.it> References: <20120410225257.GB53350@onelab2.iet.unipi.it> <4F84B6DB.5040904@freebsd.org> <20120410230500.GA22829@pit.databus.com> <20120410233211.GA53829@onelab2.iet.unipi.it> <4F855E5E.5000107@freebsd.org> <20120411110036.GA60031@onelab2.iet.unipi.it> <4F857631.8040309@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F857631.8040309@freebsd.org> User-Agent: Mutt/1.4.2.3i Cc: Barney Wolff , current@freebsd.org, net@freebsd.org Subject: Re: strange ping response times... X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2012 13:00:03 -0000 On Wed, Apr 11, 2012 at 02:16:49PM +0200, Andre Oppermann wrote: > On 11.04.2012 13:00, Luigi Rizzo wrote: > >On Wed, Apr 11, 2012 at 12:35:10PM +0200, Andre Oppermann wrote: > >>On 11.04.2012 01:32, Luigi Rizzo wrote: > >>>On Tue, Apr 10, 2012 at 07:05:00PM -0400, Barney Wolff wrote: > >>>>CPU cache? > >>>>Cx states? > >>>>powerd? > >>> > >>>powerd is disabled, and i am going down to C1 at most > >>> > sysctl -a | grep cx > >>> hw.acpi.cpu.cx_lowest: C1 > >>> dev.cpu.0.cx_supported: C1/1 C2/80 C3/104 > >>> > >>>which shouldn't take so much. Sure, cache matters, but the > >>>fact is, icmp processing on loopback should occur inline. > >>> > >>>unless there is a forced descheduling on a select with timeout> 0 > >>>which would explain the extra few microseconds (and makes me worry > >>>on how expensive is a scheduling decision...) > >> > >>Things going through loopback go through a NETISR and may > >>end up queued to avoid LOR situations. In addition per-cpu > >>queues with hash-distribution for affinity may cause your > >>packet to be processed by a different core. Hence the additional > >>delay. > > > >so you suggest that the (de)scheduling is costing several microseconds ? > > Not directly. I'm just trying to explain what's going on to > get a better idea where it may go wrong. > > There may be a poor ISR/scheduler interaction that prevents that > causes the packet to be processed only on the next tick or something > like that. I don't have a better explanation for this. ok, some final remarks just for archival purposes (still related to the loopback ping) ping takes a timestamp in userspace before trying to transmit the packet, and then the timestamp for the received packet is recorded in the kernel (in the interrupt or netisr thread i believe -- anyways, not in userspace). A thing that seems to make the difference is the choice of idle states: machdep.idle = {acpi, hlt, mwait, spin} hw.acpi.cpu.cx_lowest={C1, C2, C3} results in 10-15% of samples with significantly different values (see table below, all values in microseconds) config normal top 10-15% ----------------------------------------- acpi+c3 13 85 acpi+c2 10 40 acpi+c1 7 10 spin 7 8 hlt,mwait 6..10 (no clear distinction) ping -f 5-6 This suggests that sometimes the responding thread may be started on a sleeping core and so takes longer to wake up. The high values are not periodic but the ratio is close to the 128/1000 which relates profhz and tick, so that might be something to look at. cheers luigi