Date: Sat, 29 Jan 2011 00:52:15 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Bruce Evans <brde@optusnet.com.au> Cc: freebsd-performance@freebsd.org, Julian Elischer <julian@freebsd.org>, Stefan Lambrev <stefan.lambrev@moneybookers.com> Subject: Re: Interrupt performance Message-ID: <20110128215215.GJ18170@zxy.spb.ru> In-Reply-To: <20110129070205.Q7034@besplex.bde.org> References: <20110128143355.GD18170@zxy.spb.ru> <22E77EED-6455-4164-9115-BBD359EC8CA6@moneybookers.com> <20110128161035.GF18170@zxy.spb.ru> <CDBFAB7F-1EBC-4B3A-B2F5-6162DD58A93D@moneybookers.com> <4D42F87C.7020909@freebsd.org> <20110128172516.GG18170@zxy.spb.ru> <20110129070205.Q7034@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Jan 29, 2011 at 07:52:11AM +1100, Bruce Evans wrote: > >> there are of course several possible answers, including: > >> > >> 1/ Sometimes BSD and Linux report things differently. Linux may or may not > >> account for the lowest level interrupt tie the same as BSD > > > > But I see only 20% idle on FreeBSD and 80% idle on Linux. > > The time must be counted somewhere, so when it is not properly accounted > to packet handling, and nothing much else is running, it is accounted to > idle. > > To see how much CPU is actually available, run something else and see how > fast it runs. A simple counting loops works well on UP systems. === #include <stdio.h> #include <sys/time.h> int Dummy; int main(int argc, char *argv[]) { long int count,i,dt; struct timeval st,et; count = atol(argv[1]); gettimeofday(&st, NULL); for(i=count;i;i--) Dummy++; gettimeofday(&et, NULL); dt = (et.tv_sec-st.tv_sec)*1000000 + et.tv_usec-st.tv_usec; printf("Elapsed %d us\n",dt); } === This is ok? ./loop 2000000000 FreeBSD 1 process: Elapsed 7554193 us 2 process: Elapsed 14493692 us netperf + 1 process: Elapsed 21403644 us Linux 1 process: Elapsed 7524843 us 2 process: Elapsed 14995866 us netperf + 1 process: Elapsed 14107670 us > >> 2/ the BSD driver for that chip may be badly written, or may > >> be doing more or different work for some reason > >> 3/ the FreeBSD interrupt code may be misconfigured for that driver. > >> > >> or maybe combinations... > > Possibly, but it's a low-end NIC and those normally take a lot of CPU. > 128 kpps might take 20% of 1 3GHz CPU for even a high-end NIC on FreeBSD. > Linux has generally lower overheads and should be expected to reduce this > a bit, to perhaps as low as 15%, depending on how much of the overhead is > due to the NIC. > > >> there are profiling tools that you may decide to run. > > > > What tools I can use on amd64? > > > > I boot kernel configured with 'config -p'. > > Most time in spinlock_exit and acpi_cpu_c1. > > Normal profiling works poorly (I see you found my old mail about high > resolution profiling). Linux might be misreporting the overhead for I think next server will be support PMC. Report from PMC still poorly? > exactly the same reasons that normal profiling works poorly: > - the profiling clock frequency of ~1 KHz was adequate for 5 MHz machines > in 1998, but is now too slow. Statistics clocks are even slower (128 > Hz in FreeBSD, and possibly 100 Hz (?) jiffies in Linux). > - the statistics clock might be too synchronized with other interrupts. > The above spinlock_exit and acpi_cpu_c1 times indicate that the > statistics clock almost always fires on exit from another spinlock > and/or inside ACPI, for waking up from idle for the latter. Seeing > lots of exits from spinlocks may indicated that spinlocks are being > used too much. > But FreeBSD will report interrupt times and system for non-fast-interrupts > to an accuracy of about 1 microsecond, since it doesn't use the > statistics clock much for this. OTOH, for fast interrupts it is typical > behaviour in FreeBSD and Linux to not see them at all from the statistics > clock interrupt, since they mask all interrupts so they mask the > statistics clock interrupt in particular. In FreeBSD, lots of time > apparently spent in spinlock_exit is a typical result of this, or at > least similar things, since spinlock_enter masks all interrupts (except > in my version of course). Linux doesn't have fast interrupts in the > same way that FreeBSD does, but at least in old versions almost all of > its interrupts masked other interrupts a lot. Linux kernel 2.6.26. > >>> On Jan 28, 2011, at 6:10 PM, Slawa Olhovchenkov wrote: > >>> > >>>> On Fri, Jan 28, 2011 at 06:03:15PM +0200, Stefan Lambrev wrote: > >>>> > >>>>> Do the test with netblast ;) > >>>>> Most perf tools are written badly and for Linux. > >>>>> In our internal test netblast running on freebsd outperform everything else. > >>>> I don't speak about bad performance. > >>>> I speak about overhead. > >>>> > >>>> Linux: overhead 7% for 56K int/s > >>>> FreeBSD: overhead 59% for 14K int/s > >>>> > >>>> For processing 1/4 interrupts FreeBSD need 8x CPU. > > You showed context switches in another reply. 56k interrupts on FreeBSD > would give at least 112k context switches taking several uSec each to do > nothing except switch. This would give an overhead in the 59% range. > 14K is not so bad, but still too high unless you have a spare CPU or 32 > to handle it. Part of the lowness of low-end NICs is that they tend to > generate too many interrupts and don't have much or any way to control > this. Linux will certainly be about to handle 56K int/S better than > FreeBSD since it doesn't have heavyweight interrupt threads AFAIK. > FreeBSD also has "fast" interrupts, which are much like normal interrupts > used to be in FreeBSD. I don't know if your NIC driver uses these. I re0: [FILTER] I think this is answer ([FILTER]), but I don't understand this answer :). > guess not, since if it did then it should move the "interrupt" processing > to a task queue, where it would show up under another label and be reduced > insignificantly. > > >>>>> P.S. - /usr/src/tools/tools/netrate/netblast - we have tested little more expensive card - em/igb and bce. > > netblast should be able to saturate a low-end NIC, but may take 100% of 1 > CPU to do so (it has to busy-wait, since there is no way to select() on > the NIC ring being unfull, and timeouts don't work either since their > granularity is too large). If the NIC activity alone saturates 1 CPU, > then you might see the 100% CPU being shown for Linux too. > >>>>>> re0:<RealTek 8169SC/8110SC Single-chip Gigabit Ethernet> port 0x4000-0x40ff mem 0xf0100000-0xf01000ff irq 19 at device 4.0 on pci11 > >>>>>> re0: Chip rev. 0x18000000 > >>>>>> re0: MAC rev. 0x00000000 > >>>>>> miibus0:<MII bus> on re0 > >>>>>> rgephy0:<RTL8169S/8110S/8211B media interface> PHY 1 on miibus0 > > I don't really know if this is low-end, but guess all RealTeks are :-). FreeBSD support interrupt moderation on this chip, and chip support TOE :) > >>>>>> CPU: Intel(R) Celeron(R) CPU 420 @ 1.60GHz (1596.05-MHz K8-class CPU) > > This is low end :-). > > I mostly use old AthlonXP and Athlon64 2GHz systems for network testing, > These are a bit faster than the above. A single medium end bge (5701) > on a PCI33 bus takes 100% CPU at about 512 kpps. A single low end bge > (5705+) on a PC1333 takes 120% CPU at about 240 kpps on a 2-core system. > Linux-2.6.10 saturates well below 512 kpps on the same hardware. > > Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110128215215.GJ18170>