Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Dec 2012 15:21:55 +0100
From:      Oliver Pinter <oliver.pntr@gmail.com>
To:        Davide Italiano <davide@freebsd.org>
Cc:        freebsd-current <freebsd-current@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: [RFC/RFT] calloutng
Message-ID:  <CAPjTQNGL_7LnffWB5bbEgW0b6ekOrVzH6QQ6e2=fCFW4%2BmF6FA@mail.gmail.com>
In-Reply-To: <CACYV=-EQ=G3JZOQ-9ExGT9spbEGtH5bJOrrgN2oeE2Qh3_rKag@mail.gmail.com>
References:  <CACYV=-F7_imU-JsPfeOZEyEPGKO2PVm1w1W3VdsH3jGiDvnmBg@mail.gmail.com> <CA%2BhQ2%2BgyhRHkB9Y%2BeGADvbjvJxSNSjYC%2BTQX8-0mf9LUD1V2HA@mail.gmail.com> <CACYV=-G9sG1Oo%2Bgz3kXmdeK85P7%2BZZg1CnAPLvwCuAbNftmv6A@mail.gmail.com> <CACYV=-EQ=G3JZOQ-9ExGT9spbEGtH5bJOrrgN2oeE2Qh3_rKag@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi!

 635 -       return tticks;
 636 +       getbinuptime(&pbt);
 637 +       bt.sec =3D data / 1000;
 638 +       bt.frac =3D (data % 1000) * (uint64_t)1844674407309000LL;
 639 +       bintime_add(&bt, &pbt);
 640 +       return bt;
 641  }

What is this 1844674407309000LL constant?


 783 @@ -275,7 +288,7 @@
 784         do {
 785                 th =3D timehands;
 786                 gen =3D th->th_generation;
 787 -               bintime2timeval(&th->th_offset, tvp);
 788 +               Bintime2timeval(&th->th_offset, tvp);
 789         } while (gen =3D=3D 0 || gen !=3D th->th_generation);
 790  }
 791

Capital B is there possible a typo?

On 12/14/12, Davide Italiano <davide@freebsd.org> wrote:
> On Fri, Dec 14, 2012 at 1:57 PM, Davide Italiano <davide@freebsd.org>
> wrote:
>> On Fri, Dec 14, 2012 at 7:41 AM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:
>>>
>>> On Fri, Dec 14, 2012 at 12:12 AM, Davide Italiano <davide@freebsd.org>
>>> wrote:
>>>>
>>>> Hi.
>>>> This patch takes callout(9) and redesign the KPI and the
>>>> implementation. The main objective of this work is making the
>>>> subsystem tickless.  In the last several years, this possibility has
>>>> been discussed widely (http://markmail.org/message/q3xmr2ttlzpqkmae),
>>>> but until now noone really implemented that.
>>>> If you want a complete history of what has been done in the last
>>>> months you can check the calloutng project repository
>>>> http://svnweb.freebsd.org/base/projects/calloutng/
>>>> For lazy people, here's a summary:
>>>
>>>
>>> thanks for the work and the detailed summary.
>>> Perhaps it would be useful if you could provide a few high level
>>> details on the use and performance of the new scheme, such as:
>>>
>>> - is the old callout KPI still available ? (i am asking because it woul=
d
>>>   help maintaining third party kernel modules that are expected to
>>>   work on different FreeBSD releases)
>>>
>>
>> Obviously the old KPI is still available. callout(9) is a very popular
>> interface and I don't think removing the old interface is a good idea,
>> because could make unhappy some vendor when its code doesn't build
>> anymore on FreeBSD.
>>
>>> - do you have numbers on what is the fastest rate at which callouts
>>>   can be fired (e.g. say you have a callout which increments a
>>>   counter and schedules the next callout in (struct bintime){0,1} ) ?
>>>
>
> Right now, all the services rely on the old interface. This means they
> cannot be fired before 1 tick has elapsed, e.g. considering hz =3D 1000
> on most of the machines, 1 millisecond.
> Now that nanosleep() relies on the new interface, we measured 4-5
> microseconds latency for the processing before the callout is actually
> fired. I can't say if we can still lower this value, but I cannot
> imagine, for now, a consumer that actually request a shorter timeout.
>
>>>
>>> - is there a possibility that if callout requests are too close to each
>>>   other  (e.g. the above test) the thread dispatching callouts will
>>>   run forever ? if so, is there a way to make such thread yield
>>>   after a while ?
>>>
>
> Most of the processing is still done in a SWI thread, "at a later
> time", so I don't think this is a problem.
>
>>> - since you mentioned nanosleep() poll() and select() have been
>>>   ported to the new callout, is there a way to guarantee that user
>>>   using these functions with a very short timeout are actually
>>>   descheduled as opposed to "interval too short, don't bother" ?
>>>
>>> - do you have numbers on how many calls per second we can
>>>   have for a process that does
>>>       for (;;) {  nanosleep(min_value_that_causes_descheduling);
>>>
>
> I don't follow you here.
>
>>> I also have some comments on the diff:
>>> - can you provide a diff -p ?
>>>
>>> - for several functions the only change is the name of an argument
>>>   from "busy" to "us". Can you elaborate the reason for the change,
>>>   and whether "us" means microseconds or the pronoun ?)
>>>
>>
>> Please see r242905 by mav@.
>>
>>> Finally, a more substantial comment:
>>> - a lot of functions which formerly had only a "timo" argument
>>>   now have "timo, bt, precision, flags". Take seltdwait() as an example=
.
>>>
>>
>> seltdwait() is not part of the public KPI. It has been modified to
>> avoid code duplication. Having seltdwait() and seltdwait_bt(), i.e.
>> two separate functions, even though we could share most of the code is
>> not a clever approach, IMHO.
>> As I told before, seltdwait() is not exposed so we can modify its
>> argument without breaking anything.
>>
>>>   It seems that you have been undecided between two approaches:
>>>   for some of these functions you have preserved the original function
>>>   that deals with ticks and introduced a new one that deals with the
>>> bintime,
>>>   whereas in other cases you have modified the original function to add
>>>   "bt, precision, flags".
>>>
>>
>> I'm not. All the functions which are part of the public KPI (e.g.
>> condvar(9), sleepq(9), *) are still available.  *_flags variants have
>> been introduced so that consumers can take advantage of the new
>> 'precision tolerance mechanism' implemented. Also, *_bt variants have
>> been introduced. I don't see any "undecision" between the two
>> approaches.
>> Please note that now the callout backend deals with bintime, so every
>> time callout_reset_on() is called, the 'tick' argument passed is
>> silently converted to bintime.
>>
>>>   I would suggest a more uniform approach, namely:
>>>   - preserve all the existing functions (T) that take a timeout in
>>> ticks;
>>>   - add a new set of corresponding functions (BT) that take
>>>     bt, precision, flags _instead_ of the ticks
>>>   - the functions (T) make immediately the conversion from ticks to
>>>     bintime(s), using macros or inline
>>>   - optionally, convert kernel components to the new (BT) functions
>>>     where this makes sense (e.g. we can exploit the finer-granularity
>>>     of the new calls, etc.)
>>>
>>
>
> This is the strategy we followed.
>
>>
>>
>>> cheers
>>> luigi
>>>
>>>  1) callout(9) is not anymore constrained to the resolution a periodic
>>>>
>>>> "hz" clock can give. In order to do that, the eventtimers(4) subsystem
>>>> is used as backend.
>>>> 2) Conversely from what discussed in past, we maintained the callwheel
>>>> as underlying data structure for keeping track of the outstading
>>>> timeouts. This choice has a couple of advantages, in particular we can
>>>> still take benefits from the O(1) average complexity of the wheel for
>>>> all the operations. Also, we thought the code duplication that would
>>>> arise from the use of a two-staged backend for callout (e.g. use wheel
>>>> for coarse resolution event and another data structure, such as an
>>>> heap for high resolution events), is unacceptable. In fact, as long as
>>>> callout gained the ability to migrate from a cpu to another having a
>>>> double backend would mean doubling the code for the migration path.
>>>> 3) A way to dispatch interrupts from hardware interrupt context has
>>>> been implemented, using special callout flag. This has limited
>>>> applicability, but avoid the dispatching of a SWI thread for handling
>>>> specific callouts, avoiding the wake up of another CPU for processing
>>>> and a (relatively useless) context switch
>>>> 4) As long as new callout mechanism deals with bintime and not anymore
>>>> with ticks, time is specified as absolute and not relative anymore. In
>>>> order to get current time binuptime() or getbinuptime() is used, and a
>>>> sysctl is introduced to selectively choose the function to use, based
>>>> on a precision threshold.
>>>> 5) A mechanism for specifying precision tolerance has been
>>>> implemented. The callout processing mechanism has been adapted and the
>>>> callout data structure augmented so that the codepath can take
>>>> advantage and aggregate events which overlap in time.
>>>>
>>>>
>>>> The new proposed KPI for callout is the following:
>>>> callout_reset_bt_on(..., struct bintime time, struct bintime pr, ...,
>>>> int
>>>> flags)
>>>> where =91time=92 argument represets the time at which the callout shou=
ld
>>>> fire, =91pr=92 represents the precision tolerance expressed as an abso=
lute
>>>> value, and =91flags=92, which could be used to specify new features, i=
.e.
>>>> for now, the possibility to run the callout from fast interrupt
>>>> context.
>>>> The old KPI has been extended introducing the callout_reset_flags()
>>>> function, which is the same of callout_reset*(), but takes an
>>>> additional argument =91int flags=92 that can be used in the same fashi=
on
>>>> of the =91flags=92 argument for the new KPI. Using the =91flags=92 con=
sumers
>>>> can also specify relative precision tolerance in terms of power-of-two
>>>> portion of the timeout passed as ticks.
>>>> Using this strategy, the new precision mechanism can be used for the
>>>> existing services without major modifications.
>>>>
>>>> Some consumers have been ported to the new KPI, in particular
>>>> nanosleep(), poll(), select(), because they take immediate advantage
>>>> from the arbitrary precision offered by the new infrastructure.
>>>> For some statistics about the outcome of the conversion to the new
>>>> service, please refer to the end of this e-mail:
>>>> http://lists.freebsd.org/pipermail/freebsd-arch/2012-July/012756.html
>>>> We didn't measure any significant performance regressions with
>>>> hwmpc(4), using some benckmarks programs:
>>>> http://people.freebsd.org/~davide/poll_test/poll_test.c
>>>> http://people.freebsd.org/~mav/testsleep.c
>>>> http://people.freebsd.org/~mav/testidle.c
>>>>
>>>> We tested the code on amd64, MIPS and arm. Any kind of testing or
>>>> comment would be really appreciated. The full diff of the work against
>>>> HEAD can be found at: http://people.freebsd.org/~davide/calloutng.diff
>>>> If noone have objections, we plan to merge the repository to HEAD in a
>>>> week or so.
>>>>
>>>> Thanks,
>>>>
>>>> Davide
>>>> _______________________________________________
>>>> freebsd-current@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>>> To unsubscribe, send any mail to
>>>> "freebsd-current-unsubscribe@freebsd.org"
>>>
>>>
>>>
>>>
>>> --
>>> -----------------------------------------+-----------------------------=
--
>>>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazion=
e
>>>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>>>  TEL      +39-050-2211611               . via Diotisalvi 2
>>>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
>>> -----------------------------------------+-----------------------------=
--
>>>
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org=
"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAPjTQNGL_7LnffWB5bbEgW0b6ekOrVzH6QQ6e2=fCFW4%2BmF6FA>