From owner-freebsd-current@FreeBSD.ORG  Fri Nov 11 09:09:54 2005
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
X-Original-To: freebsd-current@freebsd.org
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 70D8F16A41F;
	Fri, 11 Nov 2005 09:09:54 +0000 (GMT)
	(envelope-from snezhko@indorsoft.ru)
Received: from indor.net.tomline.ru (indor.net.tomline.ru [213.183.100.90])
	by mx1.FreeBSD.org (Postfix) with ESMTP id D398B43D45;
	Fri, 11 Nov 2005 09:09:51 +0000 (GMT)
	(envelope-from snezhko@indorsoft.ru)
Received: from SNEZHKO by indorsoft.ru (MDaemon.PRO.v7.2.2.R)
	with ESMTP id md50000029333.msg; Fri, 11 Nov 2005 15:09:42 +0600
X-AntiVirus: Checked by Dr.Web [version: 4.32b, engine: 4.32b,
	virus records: 127198, updated: 10.11.2005]
To: John Baldwin <jhb@freebsd.org>
References: <uy83x3hon.fsf@indorsoft.ru> <uslu4ydqt.fsf@indorsoft.ru>
	<x74q6kh5oe.wl%suz@alaxala.net> <200511101140.15374.jhb@freebsd.org>
From: Victor Snezhko <snezhko@indorsoft.ru>
Date: Fri, 11 Nov 2005 15:09:36 +0600
In-Reply-To: <200511101140.15374.jhb@freebsd.org> (John Baldwin's message of
	"Thu, 10 Nov 2005 11:40:13 -0500")
Message-ID: <uoe4rim9b.fsf@indorsoft.ru>
User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (windows-nt)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Spam-Processed: indor.net.tomline.ru, Fri, 11 Nov 2005 15:09:42 +0600
	(not processed: spam filter disabled)
X-Return-Path: snezhko@indorsoft.ru
X-VVS-Spam: false
Cc: max@love2party.net, freebsd-current@freebsd.org, snezhko@indorsoft.ru,
	bug-followup@freebsd.org
Subject: Re: kern/88725: /usr/sbin/ppp panic with 2005.10.21 netinet6 changes
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Nov 2005 09:09:54 -0000

John Baldwin <jhb@freebsd.org> writes:

>>> Mark Tinguely has found the offending timer.
>>> The following patch fixes the problem for me:
>>
>> Thanks.  sounds right for me.
>> So please commit it if when you've finished the test with fresh -current.
>
> As a general rule you should be using callout_drain() before freeing a callout 
> to handle the race condition where the callout is running on another CPU (so 
> callout_stop can't stop it) while you are freeing it.  Note that you can not 
> use callout_drain() if you are holding any locks, though.  In those cases you 
> will need to defer the callout_drain() and free() until you have dropped the 
> locks.  Here's one example fix:
>
> Index: nd6.c
> ===================================================================
> RCS file: /usr/cvs/src/sys/netinet6/nd6.c,v
> retrieving revision 1.62
> diff -u -r1.62 nd6.c
> --- nd6.c       22 Oct 2005 05:07:16 -0000      1.62
> +++ nd6.c       3 Nov 2005 19:56:42 -0000
> @@ -398,7 +398,7 @@
>         if (tick < 0) {
>                 ln->ln_expire = 0;
>                 ln->ln_ntick = 0;
> -               callout_stop(&ln->ln_timer_ch);
> +               callout_drain(&ln->ln_timer_ch);
>         } else {
>                 ln->ln_expire = time_second + tick / hz;
>                 if (tick > INT_MAX) {

The code that was committed (and introduced armed timer that was
freed) is full of callout_stops and contains not a single
callout_drain.

So I think in order to be consistent we shouldn't fix two problems at
once. The right way would be to commit the fix with callout stop at
first (and close the PR) and then investigate whether we can replace
stops with drains without introducing a deadlock (for each timer
separately). In this case we will at least have a working system to
cvsdown to it if there will be issues with callout_drain.

I have tested the patch by Mark Tinguely (that fixes mld6.c) on the
fresh -current (cvsupped ~2 days ago), it works there too
(unsurprisingly). So it may be committed, I suppose (without the debug
printf, of course). 

-- 
WBR, Victor V. Snezhko
EMail: snezhko@indorsoft.ru