From owner-freebsd-net@freebsd.org Thu Oct 12 20:43:09 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 90280E34416 for ; Thu, 12 Oct 2017 20:43:09 +0000 (UTC) (envelope-from kfodil-lemelin@xiplink.com) Received: from smtp121.iad3a.emailsrvr.com (smtp121.iad3a.emailsrvr.com [173.203.187.121]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6BE17667C7 for ; Thu, 12 Oct 2017 20:43:08 +0000 (UTC) (envelope-from kfodil-lemelin@xiplink.com) Received: from smtp8.relay.iad3a.emailsrvr.com (localhost [127.0.0.1]) by smtp8.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id ABF325841; Thu, 12 Oct 2017 16:42:54 -0400 (EDT) X-Auth-ID: kfodil-lemelin@xiplink.com Received: by smtp8.relay.iad3a.emailsrvr.com (Authenticated sender: kfodil-lemelin-AT-xiplink.com) with ESMTPSA id 88D64587F; Thu, 12 Oct 2017 16:42:54 -0400 (EDT) X-Sender-Id: kfodil-lemelin@xiplink.com Received: from [10.10.1.188] ([UNAVAILABLE]. [192.252.130.194]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA) by 0.0.0.0:465 (trex/5.7.12); Thu, 12 Oct 2017 16:42:54 -0400 Subject: Re: m_move_pkthdr leaves m_nextpkt 'dangling' To: "Andrey V. Elsukov" , Adrian Chadd References: <59567148.1020902@xiplink.com> <31535133-f95a-5db6-a04c-acc0175fa287@yandex.ru> Cc: FreeBSD Net From: Karim Fodil-Lemelin Message-ID: <59DFD3CC.2000401@xiplink.com> Date: Thu, 12 Oct 2017 16:42:52 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <31535133-f95a-5db6-a04c-acc0175fa287@yandex.ru> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Oct 2017 20:43:09 -0000 On 2017-07-07 10:46 AM, Andrey V. Elsukov wrote: > On 05.07.2017 19:23, Adrian Chadd wrote: >>> As many of you know, when dealing with IP fragments the kernel will build a >>> list of packets (fragments) chained together through the m_nextpkt pointer. >>> This is all good until someone tries to do a M_PREPEND on one of the packet >>> in the chain and the M_PREPEND has to create an extra mbuf to prepend at the >>> beginning of the chain. >>> >>> When doing so m_move_pkthdr is called to copy the current PKTHDR fields >>> (tags and flags) to the mbuf that was prepended. The function also does: >>> >>> to->m_pkthdr = from->m_pkthdr; >>> >>> This, for the case I am interested in, essentially leaves the 'from' mbuf >>> with a dangling pointer m_nextpkt pointing to the next fragment. While this >>> is mostly harmless because only mbufs of pkthdr types are supposed to have >>> m_nextpkt it triggers some panics when running with INVARIANTS in NetGraph >>> (see ng_base.c :: CHECK_DATA_MBUF(m)): >>> >>> ... >>> if (n->m_nextpkt != NULL) \ >>> panic("%s: m_nextpkt", __func__); \ >>> } >>> ... >>> >>> So I would like to propose the following patch: >>> >>> @@ -442,10 +442,11 @@ m_move_pkthdr(struct mbuf *to, struct mbuf *from) >>> if ((to->m_flags & M_EXT) == 0) >>> to->m_data = to->m_pktdat; >>> to->m_pkthdr = from->m_pkthdr; /* especially tags */ >>> SLIST_INIT(&from->m_pkthdr.tags); /* purge tags from src */ >>> from->m_flags &= ~M_PKTHDR; >>> + from->m_nextpkt = NULL; >>> } >>> >>> It will reset the m_nextpkt so we don't have two mbufs pointing to the same >>> next packet. This is fairly harmless and solves a problem for us here at >>> XipLink. >> This seems like a no-brainer. :-) Any objections? > I think the change is reasonable. > But from other side m_demote_pkthdr() may also need this change. > Maybe we can wait when Gleb will be back and review this? Also he is the > author of the mentioned assertion in netgraph code. > Hi, Any updates on this one? There is another interesting patch I would like to share. This is regarding the m_tag_free function pointer in the m_tag structure. As it turns out, we use this field (m_tag_free) to track some mbuf tag at work and, in order to properly do reference counting on it, we had to modify m_tag_copy() the following way in order to keep the m_tag_free function pointer to point to the same function the original tag was pointing to (the code is a lot easier to understand than the text ...). @@ -437,6 +437,7 @@ m_tag_copy(struct m_tag *t, int how) } else #endif bcopy(t + 1, p + 1, t->m_tag_len); /* Copy the data */ + p->m_tag_free = t->m_tag_free; /* copy the 'free' function pointer */ return p; } This is because m_tag_copy uses m_tag_alloc() that resets the m_tag_free pointer to m_tag_free_default. It would be nice if this could make its way into the mbuf tag base code. Best regards, Karim.