From owner-freebsd-stable@FreeBSD.ORG Wed Jun 8 21:18:18 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BEE3416A41F for ; Wed, 8 Jun 2005 21:18:18 +0000 (GMT) (envelope-from mgrooms@seton.org) Received: from zixvpm01.seton.org (zixvpm01.seton.org [207.193.126.161]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0AF5D43D1D for ; Wed, 8 Jun 2005 21:18:17 +0000 (GMT) (envelope-from mgrooms@seton.org) Received: from zixvpm01.seton.org (ZixVPM [127.0.0.1]) by Outbound.seton.org (Proprietary) with ESMTP id 02C6E3600CE for ; Wed, 8 Jun 2005 16:18:17 -0500 (CDT) Received: from mx1-out.seton.org (unknown [10.21.254.249]) by zixvpm01.seton.org (Proprietary) with ESMTP id B161033015A; Wed, 8 Jun 2005 15:27:57 -0500 (CDT) Received: from localhost (unknown [127.0.0.1]) by mx1-out.seton.org (Postfix) with ESMTP id 9DD028014E29; Wed, 8 Jun 2005 15:27:57 -0500 (CDT) Received: from mx1-out.seton.org ([10.21.254.249]) by localhost (mx1 [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 21015-09; Wed, 8 Jun 2005 15:27:57 -0500 (CDT) Received: from ausexfe01.seton.org (ausexfe01.seton.org [10.20.10.211]) by mx1-out.seton.org (Postfix) with ESMTP id 837678014E25; Wed, 8 Jun 2005 15:27:57 -0500 (CDT) Received: from [10.20.160.190] ([10.20.160.190]) by ausexfe01.seton.org with Microsoft SMTPSVC(6.0.3790.211); Wed, 8 Jun 2005 15:26:35 -0500 Message-ID: <42A755A1.6030104@seton.org> Date: Wed, 08 Jun 2005 15:31:29 -0500 From: Matthew Grooms Organization: Seton Healthcare Network User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Max Laier References: <28FCC7CB4CF6EA43AF83BCA2096E97D013E555@AUSEX2VS1.seton.org> <42A62F52.10705@seton.org> <200506081617.05938.max@love2party.net> In-Reply-To: <200506081617.05938.max@love2party.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 08 Jun 2005 20:26:35.0843 (UTC) FILETIME=[5DF17130:01C56C68] X-Virus-Scanned: by amavisd-new at seton.org Cc: pf@freebsd.org, glebius@freebsd.org, freebsd-stable@freebsd.org, Palle Girgensohn , Kris Kennaway Subject: Re: 5.4-RELEASE lockups on amd64 SMP X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2005 21:18:18 -0000 BTW : Had you tested pfsync between two SMP systems with decent traffic flow? It usually took about 3 days to hit the LOR but the panic shows up in about 10-20 minutes. Matthew Grooms Network Engineer Seton Healthcare Network http://www.seton.net/ mgrooms@seton.org (512) 324 9913 Max Laier wrote: > Matthew, > > can you try the attached diff. Available for 5 and CURRENT. I recall that > this problem was seen before, strange that I didn't see the problem. Sounds > familiar to you? Please try the patch and let me know if that helps. Thanks > a lot. > > On Wednesday 08 June 2005 01:35, Matthew Grooms wrote: > >>Once again, here are the backtraces for the panic and lor ... >> >>Tracing id 110 tid 100089 td 0xffffff012f3f0c80 >>kdb_enter() at kdb_enter+0x2f >>panic() at panic+0x249 >>uma_dbg_free() at uma_dbg_free+0x188 >>uma_zfree_arg() at uma_zfree_arg+0x1b0 >>pf_purge_expired_states() at pf_purge_expired_states+0x41 >>pfsync_input at pfsync_input+xb35 >>pf_input() at ip_input+0x10f >>netisr_processqueue() at netisr_processqueue+0x17 >>swi_net() at swi_net+0xa8 >>ithread_loop() at ithread_loop+0xd9 >>fork_exit() at fork_exit+0xc3 >>fork_trampoline() at fork_trampoline+0xe >>--- trap 0, rip = 0, rsp = 0xffffffffb44f9d00, rbp = 0 --- >>db> continue >>boot() called on cpu#0 >>Uptime: 13h42m43s >>Dumping 4864 MB >> 16 32 ... >> >>lock order reversal > > ... > >>alltraps_with_regs_pushed() at alltraps_with_regs_pushed+0x5 >>pf_state_tree_lan_ext_RB_REMOVE() at pf_state_tree_lan_ext_RB_REMOVE+0x10c > > > This LOR is a consequence of the fault, so it can be disregarded. > > > > ------------------------------------------------------------------------ > > Index: if_pfsync.c > =================================================================== > RCS file: /usr/store/mlaier/fcvs/src/sys/contrib/pf/net/if_pfsync.c,v > retrieving revision 1.15 > diff -u -r1.15 if_pfsync.c > --- if_pfsync.c 3 May 2005 16:43:32 -0000 1.15 > +++ if_pfsync.c 8 Jun 2005 14:04:44 -0000 > @@ -132,6 +132,7 @@ > > static void pfsync_clone_destroy(struct ifnet *); > static int pfsync_clone_create(struct if_clone *, int); > +static void pfsync_senddef(void *); > #else > void pfsyncattach(int); > #endif > @@ -174,6 +175,8 @@ > callout_stop(&sc->sc_bulk_tmo); > callout_stop(&sc->sc_bulkfail_tmo); > > + callout_stop(&sc->sc_send_tmo); > + > #if NBPFILTER > 0 > bpfdetach(ifp); > #endif > @@ -220,6 +223,7 @@ > callout_init(&sc->sc_tmo, 0); > callout_init(&sc->sc_bulk_tmo, 0); > callout_init(&sc->sc_bulkfail_tmo, 0); > + callout_init(&sc->sc_send_tmo, 0); > if_attach(ifp); > > LIST_INSERT_HEAD(&pfsync_list, sc, sc_next); > @@ -1033,6 +1037,7 @@ > if (pfsyncr.pfsyncr_maxupdates > 255) > return (EINVAL); > #ifdef __FreeBSD__ > + callout_drain(&sc->sc_send_tmo); > PF_LOCK(); > #endif > sc->sc_maxupdates = pfsyncr.pfsyncr_maxupdates; > @@ -1789,15 +1794,14 @@ > #endif > > pfsyncstats.pfsyncs_opackets++; > - > #ifdef __FreeBSD__ > - PF_UNLOCK(); > -#endif > + if (IF_HANDOFF(&sc->sc_ifq, m, NULL)) > + pfsyncstats.pfsyncs_oerrors++; > + else > + callout_reset(&sc->sc_send_tmo, 1, pfsync_senddef, sc); > +#else > if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL)) > pfsyncstats.pfsyncs_oerrors++; > - > -#ifdef __FreeBSD__ > - PF_LOCK(); > #endif > } else > m_freem(m); > @@ -1807,6 +1811,22 @@ > > > #ifdef __FreeBSD__ > +static void > +pfsync_senddef(void *arg) > +{ > + struct pfsync_softc *sc = (struct pfsync_softc *)arg; > + struct mbuf *m; > + > + for(;;) { > + IF_DEQUEUE(&sc->sc_ifq, m); > + if (m == NULL) > + break; > + if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL)) > + pfsyncstats.pfsyncs_oerrors++; > + } > +} > + > + > static int > pfsync_modevent(module_t mod, int type, void *data) > { > Index: if_pfsync.h > =================================================================== > RCS file: /usr/store/mlaier/fcvs/src/sys/contrib/pf/net/if_pfsync.h,v > retrieving revision 1.5 > diff -u -r1.5 if_pfsync.h > --- if_pfsync.h 3 May 2005 16:43:32 -0000 1.5 > +++ if_pfsync.h 8 Jun 2005 14:06:03 -0000 > @@ -164,6 +164,10 @@ > struct in_addr sc_sendaddr; > struct mbuf *sc_mbuf; /* current cumulative mbuf */ > struct mbuf *sc_mbuf_net; /* current cumulative mbuf */ > +#ifdef __FreeBSD__ > + struct ifqueue sc_ifq; > + struct callout sc_send_tmo; > +#endif > union sc_statep sc_statep; > union sc_statep sc_statep_net; > u_int32_t sc_ureq_received; > > > ------------------------------------------------------------------------ > > Index: if_pfsync.c > =================================================================== > RCS file: /usr/store/mlaier/fcvs/src/sys/contrib/pf/net/if_pfsync.c,v > retrieving revision 1.11.2.2 > diff -u -r1.11.2.2 if_pfsync.c > --- if_pfsync.c 19 May 2005 10:59:22 -0000 1.11.2.2 > +++ if_pfsync.c 8 Jun 2005 14:07:17 -0000 > @@ -130,6 +130,7 @@ > > static void pfsync_clone_destroy(struct ifnet *); > static int pfsync_clone_create(struct if_clone *, int); > +static void pfsync_senddef(void *); > #else > void pfsyncattach(int); > #endif > @@ -170,6 +171,8 @@ > callout_stop(&sc->sc_bulk_tmo); > callout_stop(&sc->sc_bulkfail_tmo); > > + callout_stop(&sc->sc_send_tmo); > + > #if NBPFILTER > 0 > bpfdetach(ifp); > #endif > @@ -216,6 +219,7 @@ > callout_init(&sc->sc_tmo, 0); > callout_init(&sc->sc_bulk_tmo, 0); > callout_init(&sc->sc_bulkfail_tmo, 0); > + callout_init(&sc->sc_send_tmo, 0); > if_attach(&sc->sc_if); > > LIST_INSERT_HEAD(&pfsync_list, sc, sc_next); > @@ -913,6 +917,7 @@ > if (pfsyncr.pfsyncr_maxupdates > 255) > return (EINVAL); > #ifdef __FreeBSD__ > + callout_drain(&sc->sc_send_tmo); > PF_LOCK(); > #endif > sc->sc_maxupdates = pfsyncr.pfsyncr_maxupdates; > @@ -1634,15 +1639,14 @@ > #endif > > pfsyncstats.pfsyncs_opackets++; > - > #ifdef __FreeBSD__ > - PF_UNLOCK(); > -#endif > + if (IF_HANDOFF(&sc->sc_ifq, m, NULL)) > + pfsyncstats.pfsyncs_oerrors++; > + else > + callout_reset(&sc->sc_send_tmo, 1, pfsync_senddef, sc); > +#else > if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL)) > pfsyncstats.pfsyncs_oerrors++; > - > -#ifdef __FreeBSD__ > - PF_LOCK(); > #endif > } else > m_freem(m); > @@ -1652,6 +1656,22 @@ > > > #ifdef __FreeBSD__ > +static void > +pfsync_senddef(void *arg) > +{ > + struct pfsync_softc *sc = (struct pfsync_softc *)arg; > + struct mbuf *m; > + > + for(;;) { > + IF_DEQUEUE(&sc->sc_ifq, m); > + if (m == NULL) > + break; > + if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL)) > + pfsyncstats.pfsyncs_oerrors++; > + } > +} > + > + > static int > pfsync_modevent(module_t mod, int type, void *data) > { > Index: if_pfsync.h > =================================================================== > RCS file: /usr/store/mlaier/fcvs/src/sys/contrib/pf/net/if_pfsync.h,v > retrieving revision 1.4 > diff -u -r1.4 if_pfsync.h > --- if_pfsync.h 16 Jun 2004 23:24:00 -0000 1.4 > +++ if_pfsync.h 8 Jun 2005 14:07:48 -0000 > @@ -158,8 +158,12 @@ > struct timeout sc_bulkfail_tmo; > #endif > struct in_addr sc_sendaddr; > - struct mbuf *sc_mbuf; /* current cummulative mbuf */ > - struct mbuf *sc_mbuf_net; /* current cummulative mbuf */ > + struct mbuf *sc_mbuf; /* current cumulative mbuf */ > + struct mbuf *sc_mbuf_net; /* current cumulative mbuf */ > +#ifdef __FreeBSD__ > + struct ifqueue sc_ifq; > + struct callout sc_send_tmo; > +#endif > union sc_statep sc_statep; > union sc_statep sc_statep_net; > u_int32_t sc_ureq_received;