Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Dec 2012 14:58:56 +0100
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        Garrett Cooper <yanegomi@gmail.com>
Cc:        emulation@freebsd.org, current@freebsd.org, jfvogel@gmail.com
Subject:   Re: [RFC] proposed 'lem' patch to improve behaviour under emulation
Message-ID:  <20121227135856.GA51105@onelab2.iet.unipi.it>
In-Reply-To: <CAGH67wSvdkgrmfYpC5T3tJqtUZjTs-iQUE63Pff9BSoGwLyBAw@mail.gmail.com>
References:  <20121227094649.GA48891@onelab2.iet.unipi.it> <CAGH67wSvdkgrmfYpC5T3tJqtUZjTs-iQUE63Pff9BSoGwLyBAw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 27, 2012 at 02:26:44AM -0800, Garrett Cooper wrote:
> On Thu, Dec 27, 2012 at 1:46 AM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:
> > This patch implements two features for the 'lem' driver that
> > greatly improve the throughput under proper hypervisor.
> > This is joint work with Vincenzo Maffione and Giuseppe Lettieri,
> > I am posting it here for review, will then commit it
> > if there are no objections.
> >
> > The first change is to implement a sysctl to access the 'itr'
> > interrupt moderation register for the devices supported by this
> > driver. It is little more than adding a struct into the device
> > descriptor, and one line to create the dynamic sysctl entry, same
> > as it is done for the other mitigation registers.
> >
> > The second change is more interesting and has huge benefits on througput.
> >
> > Under virtualization, "VM exits" (which happen every time there is
> > an access to a register of the emulated peripheral) are extremely
> > expensive.  In the tx path of the 'lem' driver, there is a write
> > to the TDT register on every packet sent.
> >
> > The patch we propose, if enabled through a sysctl (defaults off,
> > so no change from current behaviour) defers writes to the TDT
> > register when there is a pending transmit interrupt.
> > This means that, together with proper emulation of interrupt
> > mitigation on the hypervisor side, the number of VM exits
> > is dramatically reduced. To give you an idea, on a modern
> > system with qemu-kvm and companion patches, UDP throughput is
> >
> >                                         KVM             QEMU
> > standard KVM, standard driver            20 Kpps         6.3 Kpps
> > modified KVM, standard driver            37 Kpps        28 Kpps
> > modified KVM, modified driver           200 Kpps        34 Kpps
> >
> > As you can see, on kvm this change gives a 5x speedup to the tx path,
> > which combines nicely with the 2x speedup that comes from supporting
> > interrupt mitigation alone in the hypervisor. Without kvm (or kqemu ?)
> > the benefits are much lower, as the guest becomes too slow.
> >
> > Patch follows. It would be good if people with real hardware
> > using the 'lem' driver could test it to make sure it does no
> > harm on their devices (in any case the sysctl variable
> > dev.em.0.mit_enable must be set to explicitly enable it
> > at runtime).
> >
> > (for those curious to test it under kvm, i am also attaching a
> > patch that you need to apply to qemu in order to exploit the
> > effect of interrupt mitigation; it is a followup of a similar
> > patch i posted in july to the qemu mailing list, and i will
> > post it the update there as well, shortly. Unfortunately
> > we do not have kvm on freebsd..)
> 
>     A few comments.
> Thanks!
> -Garrett
> 
> Index: if_lem.c
> ===================================================================
> --- if_lem.c	(revision 244673)
> +++ if_lem.c	(working copy)
> @@ -32,6 +32,8 @@
>  ******************************************************************************/
>  /*$FreeBSD$*/
> 
> +#define MITIGATION
> +
> 
> gcooper> Could you please make MITIGATION into a proper compile time
> flag via sys/conf/options with a more descript name?

this is actually going away, with the code compiled in by default

...
> @@ -281,6 +283,9 @@
>  #define EM_TICKS_TO_USECS(ticks)	((1024 * (ticks) + 500) / 1000)
>  #define EM_USECS_TO_TICKS(usecs)	((1000 * (usecs) + 512) / 1024)
> 
> +#define MAX_INTS_PER_SEC	8000
> +#define DEFAULT_ITR	     1000000000/(MAX_INTS_PER_SEC * 256)
> +
> 
> gcooper> Add parentheses around DEFAULT_ITR (I know the code was just
> shuffled around, but thought I could ask :)..)?

will do

thanks
luigi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121227135856.GA51105>