Date: Wed, 23 Feb 2011 12:34:39 -0500 From: Ryan Stone <rysto32@gmail.com> To: freebsd-net <freebsd-net@freebsd.org> Subject: New device_polling algorithm Message-ID: <AANLkTikssob6OLbqrgG43ahO6V_gTc0pDp8RX3TMRNgL@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
I've put together a patch against HEAD that replaces the device_polling algorithm(it should apply cleanly to stable/8 as well -- nothing has changed with polling in some time). The patch can be found here: http://people.freebsd.org/~rstone/kern_poll.diff The new algorithm makes use of the feedback that is already provided by pollers(but the current algorithm ignores it). Each poller returns a value indicating how much "work" the polling handler performed in this call(typically this is the number of packets handled). The new algorithm tries to spend (100 - user_frac)% of CPU time handling packets in the netisr thread. This includes time in the pollers as well as time spent on other netisr tasks. It uses the feedback from the pollers in two ways in order to achieve this: - The feedback is used to decide whether it's worthwhile to do another iteration of polling in this tick. If no poller handles more than count/2 packets in the current iteration, the algorithm concludes that there isn't enough outstanding work to continue polling and another iteration won't be scheduled until the next tick. Note that this means that polling iterations can be rescheduled again and again in a tick if there are a lot of packets waiting, which is a new feature. - The feedback is used to estimate how much time it will take to do another iteration of polling. The algorithm dynamically adjusts the count parameter passed to each driver to try and ensure that it only uses as much CPU time as it has been allotted with user_frac. This is necessary to prevent the poller from rescheduling itself too often and starving other threads, especially on single-core machines. If you're on a multicore machine it might be a good idea to decrease the sysctl kern.polling.user_frac. This sysctl restricts how much CPU time the poller is allowed to use on a single CPU. Smaller values mean less time for other tasks and more time for the poller. The poller won't necessarily use (100 - user_frac)% of a CPU. That's the maximum amount of time it's allowed to use, but if the pollers are lightly loaded the poller will use significantly less time. The default is 50, which is reasonable for a uniprocessor system. On a multicore machine you might find this overly restrictive as you could set this all the way down to 0 on a dual-core machine and get the same 50-50 split of CPU time between the poller and everything else. I've put SDT probes in various strategic places. I have a simple dtrace script that logs the data from the probes here: http://people.freebsd.org/~rstone/device_polling.d The script is just a replacement for some KTRs that we had at the same places in our internal branch, so it currently doesn't do anything fancy. I've found the KTRs invaluable for debugging polling problems in the past, though, so I think that it's worth sharing. You might notice that the SDT probes log a "poller index", but it's currently always 0. I would like to extend the poller further to take advantage of multiple netisrs so I've made sure that the probes are ready for this, but I'll talk about my multi-polling ideas later on in another thread. Any comments and testing would be welcome. We mostly only run our code on machines with lem/em/ixgbe devices, so testing against other drivers would be especially welcome. Ryan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTikssob6OLbqrgG43ahO6V_gTc0pDp8RX3TMRNgL>