Date: Tue, 22 Nov 2011 08:43:20 -0500 From: John Baldwin <jhb@freebsd.org> To: Luigi Rizzo <rizzo@iet.unipi.it> Cc: Matteo Landi <matteo@matteolandi.net>, Doug Barton <dougb@freebsd.org>, freebsd-current@freebsd.org Subject: Re: ixgbe and fast interrupts Message-ID: <201111220843.21207.jhb@freebsd.org> In-Reply-To: <20111121173614.GA63552@onelab2.iet.unipi.it> References: <CALJ8J_HPZewO12uanb=kctQYwepMssr63E0DQh9CqV6PGaC=JA@mail.gmail.com> <201111211129.29362.jhb@freebsd.org> <20111121173614.GA63552@onelab2.iet.unipi.it>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, November 21, 2011 12:36:15 pm Luigi Rizzo wrote: > On Mon, Nov 21, 2011 at 11:29:29AM -0500, John Baldwin wrote: > > On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote: > > > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote: > > > > On 11/18/2011 09:54, Luigi Rizzo wrote: > > > > > One more thing (i am mentioning it here for archival purposes, > > > > > as i keep forgetting to test it). Is entropy harvesting expensive ? > > > > > > > > No. It was designed to be inexpensive on purpose. :) > > > > > > hmmm.... > > > unfortunately I don't have a chance to test it until monday > > > (probably one could see if the ping times change by modifying > > > the value of kern.random.sys.harvest.* ). > > > > > > But in the code i see the following: > > > > > > - the harvest routine is this: > > > > > > void > > > random_harvest(void *entropy, u_int count, u_int bits, u_int frac, > > > enum esource origin) > > > { > > > if (reap_func) > > > (*reap_func)(get_cyclecount(), entropy, count, bits, frac, > > > origin); > > > } > > > > > > - the reap_func seems to be bound to > > > > > > dev/random/randomdev_soft.c::random_harvest_internal() > > > > > > which internally uses a spinlock and then moves entries between > > > two lists. > > > > > > I am concerned that the get_cyclecount() might end up querying an > > > expensive device (is it using kern.timecounter.hardware ?) > > > > On modern x86 it just does rdtsc(). > > > > > So between the indirect function call, spinlock, list manipulation > > > and the cyclecounter i wouldn't be surprised it the whole thing > > > takes a microsecond or so. > > > > I suspect it is not quite that expensive. > > > > > Anyways, on monday i'll know better. in the meantime, if someone > > > wants to give it a try... in our tests between two machines and > > > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping > > > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with > > > a ping -f . > > > > Did you time it with harvest.interrupt disabled? > > yes, thanks for reminding me to post the results. > > Using unmodified ping (which has 1us resolution on the reports), > there is no measurable difference irrespective > of the setting of kern.random.sys.harvest.ethernet, > kern.random.sys.harvest.interrupt and kern.timecounter.hardware. > Have tried to set hw mitigation to 0 on the NIC (ixgbe on both > sides) but there is no visible effect either. I had forgotten that kern.random.sys.harvest.interrupt only matters if the interrupt handlers pass the INTR_ENTROPY flag to bus_setup_intr(). I suspect your drivers probably aren't doing that anyway. > However I don't trust my measurements because i cannot explain them. > Response times have a min of 20us (about 50 out of 5000 samples) > and a median of 27us, and i really don't understand if the low > readings are real or the result of some races. Hmm, 7 us does seem a bit much for a spread. > Ping does a gettimeofday() for the initial timestamp, and relies > on in-kernel timestamp for the response. Hmm, gettimeofday() isn't super cheap. What I do for measuring RTT is to use an optimized echo server (not the one in inetd) on the remote host and reflect packets off of that. The sender/receiver puts a TSC timestamp into the packet payload and computes a TSC delta when it receives the reflected response. I then run ministat over the TSC deltas to get RTT in TSC counts and use machdep.tsc_freq of the sending machine to convert the TSC delta values to microseconds. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201111220843.21207.jhb>