Date: Fri, 22 Aug 2008 14:39:27 -0700 From: gnn@FreeBSD.org To: "Bruce M. Simpson" <bms@FreeBSD.org> Cc: Luigi Rizzo <rizzo@iet.unipi.it>, net@FreeBSD.org Subject: Re: Small patch to multicast code... Message-ID: <m2fxowhgq8.wl%gnn@neville-neil.com> In-Reply-To: <48AF08B7.4090804@FreeBSD.org> References: <m27iaa6v43.wl%gnn@neville-neil.com> <20080821203519.GA51534@onelab2.iet.unipi.it> <m23aky6ncl.wl%gnn@neville-neil.com> <48AE23FF.9070009@FreeBSD.org> <m2tzdd6j36.wl%gnn@neville-neil.com> <48AF08B7.4090804@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
At Fri, 22 Aug 2008 19:43:03 +0100, Bruce M. Simpson wrote: > > We end up calling if_simloop() from a few "interesting" places, in > particular the kernel PIM packet handler. > > In this particular case we're going to take a full mbuf chain copy every > time we send a packet which needs to be looped back to userland. Right, I know the penalty. > It's been a while since I've done any in-depth FreeBSD work other > than hacking on the IGMPv3 snap, and my time is largely tied up with > other work these days, sadly. > > It doesn't seem right to my mind that we need to make a full copy of > an mbuf chain with m_dup() to workaround this kind of problem. > > Whilst it may suffice for a band-aid workaround, we may see mbuf > pool fragmentation as packet rates go up. > > However we are now in a "new world order" where mbuf chains may be > very tied to the device where they've originated or to where they're > going. It isn't clear to me where this kind of intrusion is > happening. > > In the case of ip_mloopback(), somehow we are stomping on a > read-only copy of an mbuf chain. The use of m_copy() with m_pullup() > there is fine according to the documented uses of mbuf(9), although > as Luigi pointed out, most likely we need to look at the upper-layer > protocol too, e.g. where UDP checksums are also being offloaded. > > Some of the code in the IGMPv3 branch actually reworks how loopback > happens i.e. the preference is not to loop back wherever possible > because of the locking implications. Check the bms_netdev branch > history for more info. Well, what I suspect is the problem are these bits: udp_output(): /* * Set up checksum and output datagram. */ if (udp_cksum) { if (inp->inp_flags & INP_ONESBCAST) faddr.s_addr = INADDR_BROADCAST; ui->ui_sum = in_pseudo(ui->ui_src.s_addr, faddr.s_addr, htons((u_short)len + sizeof(struct udphdr) + IPPROTO_UDP)); m->m_pkthdr.csum_flags = CSUM_UDP; m->m_pkthdr.csum_data = offsetof(struct udphdr, uh_sum); } else ip_mloopback(): copym = m_copy(m, 0, M_COPYALL); if (copym != NULL && (copym->m_flags & M_EXT || copym->m_len < hlen)) copym = m_pullup(copym, hlen); if (copym != NULL) { /* If needed, compute the checksum and mark it as valid. */ if (copym->m_pkthdr.csum_flags & CSUM_DELAY_DATA) { in_delayed_cksum(copym); copym->m_pkthdr.csum_flags &= ~CSUM_DELAY_DATA; copym->m_pkthdr.csum_flags |= CSUM_DATA_VALID | CSUM_PSEUDO_HDR; copym->m_pkthdr.csum_data = 0xffff; } and: in_delayed_cksum(struct mbuf *m) { struct ip *ip; u_short csum, offset; ip = mtod(m, struct ip *); offset = ip->ip_hl << 2 ; csum = in_cksum_skip(m, ip->ip_len, offset); if (m->m_pkthdr.csum_flags & CSUM_UDP && csum == 0) csum = 0xffff; offset += m->m_pkthdr.csum_data; /* checksum offset */ Somehow the data that the device needs to do the proper checksum offload is getting trashed here. Now, since it's clear we need a writable packet structure so that we don't trash the original, I'm wondering if the m_pullup() will be sufficient. Best, George
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?m2fxowhgq8.wl%gnn>