From owner-freebsd-net@FreeBSD.ORG Wed Sep 11 15:18:29 2013 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 7DA9AEEC for ; Wed, 11 Sep 2013 15:18:29 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id F111022FA for ; Wed, 11 Sep 2013 15:18:28 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.7/8.14.7) with ESMTP id r8BFIQ8q050990; Wed, 11 Sep 2013 19:18:26 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.7/8.14.7/Submit) id r8BFIQvM050989; Wed, 11 Sep 2013 19:18:26 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Wed, 11 Sep 2013 19:18:26 +0400 From: Gleb Smirnoff To: Yuri Subject: Re: Packet loss when 'control' messages are present with large data (sendmsg(2)) Message-ID: <20130911151826.GP4574@FreeBSD.org> References: <522300E3.6050303@rawbw.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <522300E3.6050303@rawbw.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Sep 2013 15:18:29 -0000 Yuri, On Sun, Sep 01, 2013 at 01:54:59AM -0700, Yuri wrote: Y> I found the case when sendmsg(2) silently loses packets for AF_LOCAL Y> domain when large packets with control part in them are sent. Y> Y> Here is how: Y> There is the watermark limit on sockbuf determined by Y> net.local.stream.sendspace, default is 8192 bytes (field sockbuf.sb_hiwat). Y> When sendmsg(2) sends large enough data (8K+ that hits this 8192 limit) Y> with control message, sosend_generic will be cutting the message data Y> into separate mbufs based on 'sbspace' (derived from the above-mentioned Y> sb_hiwat limit) with adjustment for control message size as it sees it. Y> This way it tries to make sure this sb_hiwat limit is enforced. Y> Y> However, down on uipc level control message is being further modified in Y> two ways: unp_internalize modifies it into some 'internal' form, also Y> unp_addsockcred function adds another control message when LOCAL_CREDS Y> are requested by client. Both functions only increase control message Y> size beyond its original size (seen by sosend_generic). So that the Y> first final mbuf sent (concatenation of control and data) will always be Y> larger than 'sbspace' limit that sosend_generic was cutting data for. Y> Y> There is also the function sbappendcontrol_locked. It checks the Y> 'sbspace' limit again, and discards the packet when sbspace llimit is Y> exceeded. Its result code is essentially ignored in uipc_send. I Y> believe, sbappendcontrol_locked shouldn't be checking space at all, Y> since packets are expected to be properly sized to begin with. But this Y> won't be the right fix, since sizes would be exceeding the sbspace limit Y> anyway. Y> Y> sosend_default is one level up over uipc level, and it doesn't know what Y> uipc will do with control message. Therefore it can't know what the real Y> adjustment for control message is needed (to properly cut data). It Y> wrongly takes the original control size and this makes the first packet Y> too large and discarded by sbappendcontrol_locked. Y> Y> To solve the problem, I propose to add another function into struct Y> pr_usrreqs: Y> int (*pru_finalizecontrol)(struct socket *so, int flags, struct mbuf Y> **pcontrol); Y> Y> This function will be called from sosend_default and sosend_dgram. Y> uipc_finalizecontrol will do the same that unp_internalize and Y> unp_addsockcred do on uipc level, and it will allow sosend_default to Y> see the final version of the control message, and properly split data Y> into pieces when data is large enough to hit the limit. Y> Y> I felt I better discuss such addition to struct pr_usrreqs, because it Y> might seem like an overkill to add this function just to solve this one Y> local issue. But it seems there is no other solution (other than just Y> ignoring the occasionally larger mbuf size). Y> Y> I can easily make a patch fixing this issue with this new function. Thanks for investigation! Can you please send at least a program that is test case for the above problem? A patch that fixes would be also appreciated. -- Totus tuus, Glebius.