From owner-freebsd-net@FreeBSD.ORG Fri Mar 30 22:12:42 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0F7711065670 for ; Fri, 30 Mar 2012 22:12:42 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-245-Pennsylvania.hfc.comcastbusiness.net [75.149.8.245]) by mx1.freebsd.org (Postfix) with ESMTP id CFC6F8FC0A for ; Fri, 30 Mar 2012 22:12:41 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id C9785446003 for ; Fri, 30 Mar 2012 18:04:40 -0400 (EDT) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZfGL9DYOHL00 for ; Fri, 30 Mar 2012 18:04:35 -0400 (EDT) Received: from riven.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id D0936446002 for ; Fri, 30 Mar 2012 18:04:34 -0400 (EDT) From: Andrew Boyer Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Fri, 30 Mar 2012 18:04:24 -0400 Message-Id: To: freebsd-net@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: LACP kernel panics: /* unlocking is safe here */ X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Mar 2012 22:12:42 -0000 While investigating a LACP issue, I turned on LACP_DEBUG on a debug = kernel. In this configuration it's easy to panic the kernel - just run = 'ifconfig lagg0 laggproto lacp' on a lagg that's already in LACP mode = and receiving LACP messages. The problem is that lagg_lacp_detach() drops the lagg wlock (with the = comment in the title), which allows incoming LACP messages to get = through lagg_input() while the structure is being destroyed in = lacp_detach(). There's a very simple fix, but I don't know if it's the best way to fix = it. Resetting the protocol before calling sc_detach causes any further = incoming packets to be dropped until the lagg gets reconfigured. = Thoughts? Is it safe to just hold on to the lagg wlock across the callout_drain() = calls in lacp_detach()? That's what OpenBSD does. -Andrew Index: sys/net/if_lagg.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- sys/net/if_lagg.c (revision 233707) +++ sys/net/if_lagg.c (working copy) @@ -952,9 +952,10 @@ } if (sc->sc_proto !=3D LAGG_PROTO_NONE) { LAGG_WLOCK(sc); + /* Reset protocol */ + sc->sc_proto =3D LAGG_PROTO_NONE; error =3D sc->sc_detach(sc); - /* Reset protocol and pointers */ - sc->sc_proto =3D LAGG_PROTO_NONE; + /* Reset pointers */ sc->sc_detach =3D NULL; sc->sc_start =3D NULL; sc->sc_input =3D NULL; -------------------------------------------------- Andrew Boyer aboyer@averesystems.com