From owner-freebsd-pf@FreeBSD.ORG Fri May 25 11:26:54 2012 Return-Path: Delivered-To: freebsd-pf@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E2B94106564A; Fri, 25 May 2012 11:26:54 +0000 (UTC) (envelope-from Joerg.Pulz@frm2.tum.de) Received: from mailhost.frm2.tum.de (mailhost.frm2.tum.de [129.187.179.12]) by mx1.freebsd.org (Postfix) with ESMTP id 48A8D8FC0A; Fri, 25 May 2012 11:26:54 +0000 (UTC) Received: from mailhost.frm2.tum.de (localhost [127.0.0.1]) by mailhost.frm2.tum.de (8.14.4/8.14.4) with ESMTP id q4PBPc8X062293; Fri, 25 May 2012 13:25:38 +0200 (CEST) (envelope-from Joerg.Pulz@frm2.tum.de) X-Virus-Scanned: at mailhost.frm2.tum.de Received: from hades.admin.frm2 (hades.admin.frm2 [172.25.1.10]) (authenticated bits=0) by mailhost.frm2.tum.de (8.14.4/8.14.4) with ESMTP id q4PBPZfe062289 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 25 May 2012 13:25:35 +0200 (CEST) (envelope-from Joerg.Pulz@frm2.tum.de) Date: Fri, 25 May 2012 13:25:32 +0200 (CEST) From: Joerg Pulz To: Daniel Hartmeier In-Reply-To: <20120525091627.GA27514@insomnia.benzedrine.cx> Message-ID: References: <201205250730.q4P7UGu0006036@freefall.freebsd.org> <20120525091627.GA27514@insomnia.benzedrine.cx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (mailhost.frm2.tum.de [129.187.179.12]); Fri, 25 May 2012 13:25:35 +0200 (CEST) Cc: bug-followup@freebsd.org, freebsd-pf@freebsd.org Subject: Re: kern/168190: [pf] panic when using pf and route-to (maybe: bad fragment handling?) X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 May 2012 11:26:55 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 25 May 2012, Daniel Hartmeier wrote: > On Fri, May 25, 2012 at 07:30:16AM +0000, Joerg Pulz wrote: > >> the system is still running without panic, but i found the following log >> entries from last night: >> >> May 24 23:28:57 charon kernel: pf_route: m0->m_len < sizeof(struct ip) >> May 24 23:28:57 charon kernel: pf_route: m0->m_len < sizeof(struct ip) >> >> Do you think that this may be related to the panics? >> I've found this error message two times in contrib/pf/net/pf.c. >> I can't say which of them or both have printed the message. > > Yes, this could possibly explain it. > > All pfil consumers assume that the IP header is one continuous memory > region in the mbuf, without verifying this or correcting it (with > m_pullup() or such) if wrong. > > pf: pf_check_in() in sys/contrib/pf/net/pf_ioctl.c > h = mtod(*m, struct ip *); > access h->ip_len > > ipfw: ipfw_check_hook() in sys/netinet/ipfw/ip_fw_pfil.c > SET_NET_IPLEN(mtod(*m0, struct ip *)); > > ipfilter: fr_check_wrapper() in sys/contrib/ipfilter/netinet/ip_fil_freebsd.c > struct ip *ip = mtod(*mp, struct ip *); > access ip->ip_hl > > Hence, the caller of pfil_run_hooks() must ensure this before the call. > ip_input() does, but there are operations that might violate the > condition again. > > If the IP header is not continuous in the first mbuf, doing > > struct ip *ip = mtod(m, struct ip *); > ip->ip_len = ntohs(ip->ip_len); > > will read (and write) unrelated memory. > > If, later, something does call m_pullup(), ip_len will get 'replaced' > again. This could lead to some byte order swaps getting undone. > > I'm not sure what the proper action is here, i.e. should we be surprised > that an mbuf with such a small m_len is found, and track down how it > was produced, or should the pfil code simply expect this? > > I'd probably add a sanity check to pfil_run_hooks(), like > > --- sys/net/pfil.c 23 Sep 2011 00:51:37 -0000 1.19.2.1 > +++ sys/net/pfil.c 25 May 2012 09:10:27 -0000 > @@ -46,6 +46,8 @@ > > #include > #include > +#include > +#include > > static struct mtx pfil_global_lock; > > @@ -74,15 +76,21 @@ > struct mbuf *m = *mp; > int rv = 0; > > + if (m->m_pkthdr.len < sizeof(struct ip) || > + m->m_len < sizeof(struct ip)) > + panic("pfil_run_hooks: m->m_pkthdr.len %d, m->m_len %d", > + (int)m->m_pkthdr.len, (int)m->m_len); > PFIL_RLOCK(ph, &rmpt); > KASSERT(ph->ph_nhooks >= 0, ("Pfil hook count dropped < 0")); > for (pfh = pfil_hook_get(dir, ph); pfh != NULL; > pfh = TAILQ_NEXT(pfh, pfil_link)) { > if (pfh->pfil_func != NULL) { > + ASSERT_HOST_BYTE_ORDER(m); > rv = (*pfh->pfil_func)(pfh->pfil_arg, &m, ifp, dir, > inp); > if (rv != 0 || m == NULL) > break; > + ASSERT_HOST_BYTE_ORDER(m); > } > } > PFIL_RUNLOCK(ph, &rmpt); > > Then when it will panic (instead of just the pf_route() message), the > stack trace could help. > > This might require several iterations, adding such checks all around the > place. The cause might be some mbuf operation done in ipsec, for some > edge case, explaining why it occurs so rarely... > > If I could easily reproduce it locally, I'd probably do it, but it's > your machine that crashes all the time, so it's your call :) Daniel, thanks for your explaining. As i said, i will do everything to track this down to the real bottom to get it fixed forever. Your proposed changes are in and the system is running with the rebuilt kernel. Now it's time to wait and see what happens. Kind regards Joerg - -- The beginning is the most important part of the work. -Plato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iD8DBQFPv2wvSPOsGF+KA+MRAp1DAKCp2KZdPKBdsR5PUbK3ixqXqFdi9ACfbfvd vez+bq9X5lMjbdysCXy+stU= =tvtF -----END PGP SIGNATURE-----