Date: Sun, 02 Dec 2007 14:55:43 +0100 From: Andre Oppermann <andre@freebsd.org> To: Poul-Henning Kamp <phk@phk.freebsd.dk> Cc: Attilio Rao <attilio@freebsd.org>, arch@freebsd.org, Robert Watson <rwatson@freebsd.org> Subject: Re: New "timeout" api, to replace callout Message-ID: <4752B95F.20308@freebsd.org> In-Reply-To: <18719.1196601915@critter.freebsd.dk> References: <18719.1196601915@critter.freebsd.dk>
next in thread | previous in thread | raw e-mail | index | archive | help
Poul-Henning Kamp wrote: > In message <4752AABE.6090006@freebsd.org>, Andre Oppermann writes: > >>> It is my intent, that the implementation behind the new API will >>> only ever grab the specified lock when it calls the timeout function. >> This is the same for the current one and pretty much a given. >> >>> When you do a timeout_disable() or timeout_cleanup() you will be >>> sleeping on a mutex internal to the implementation, if the timeout >>> is currently executing. >> This is the problematic part. We can't sleep in TCP when cleaning up >> the timer. > > The trouble arises because the current callout implementation will > try to sleep on the timeouts lock, and once it does that, you cannot > cancel it any more. It hurts us big time in the TCP code. > I'm going to exchange that problem for once that is less severe. > > My plan is to use non-blocking grabs of the timeouts lock to get > around that race. > > When a timeouts timer expires, the thread that services the timeouts > will try to get the lock in a non-blocking fashion, and if it fails, > be put on a queue, to be retried after any other expired timeouts > have had their chance. In TCP we've got two types of races: o Timer expires on active session but source of timer was just handled (because segment just arrived). To simplify detection of timer races some generation count passed together with the timer may be of value. That way I (or the timer code) can easily detect if this invocation of the callback has become obsolete. o On shutdown we have to get rid of all timers for sure because once we release the lock it is immediately destroyed and the memory is freed and cleared. There is no way the timer must even try to look at the lock again. This is our major problem child in the TCP and socket lifecycle code. There is another fine line. When doing a timer cleanup do I get to know if there is a timeout pending and waiting in the CPU queue? In other words can timeout_cleanup() tell us with certainty that a timeout is no longer active and/or pending? This would help us half way. Other than that is a flag planned saying "try only once" to obtain the lock? This may help the first race. Though the current TCP code is not structured to work that way it could move in that direction. > That leaves only the question of "how hard to we try to get the lock > with non-blocking means". > > The answer to that will depend on how big a problem it is in practice. > > Adding timeout_cleanup() as an explicit end of life indicator for > the timeout structure and its lock, makes it possible to use blocking > methods, at high expense, in those rare cases where non-blocking > means keeps failing. > > But lets hope we will not need that. -- Andre
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4752B95F.20308>