From owner-freebsd-net@FreeBSD.ORG Sun Sep 1 08:55:06 2013 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id A2962C0C for ; Sun, 1 Sep 2013 08:55:06 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell0.rawbw.com (shell0.rawbw.com [198.144.192.45]) by mx1.freebsd.org (Postfix) with ESMTP id 3F4CF2E9B for ; Sun, 1 Sep 2013 08:55:05 +0000 (UTC) Received: from eagle.yuri.org (stunnel@localhost [127.0.0.1]) (authenticated bits=0) by shell0.rawbw.com (8.14.4/8.14.4) with ESMTP id r818sx9V066801 for ; Sun, 1 Sep 2013 01:54:59 -0700 (PDT) (envelope-from yuri@rawbw.com) Message-ID: <522300E3.6050303@rawbw.com> Date: Sun, 01 Sep 2013 01:54:59 -0700 From: Yuri User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130822 Thunderbird/17.0.8 MIME-Version: 1.0 To: net@FreeBSD.org Subject: Packet loss when 'control' messages are present with large data (sendmsg(2)) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Sep 2013 08:55:06 -0000 I found the case when sendmsg(2) silently loses packets for AF_LOCAL domain when large packets with control part in them are sent. Here is how: There is the watermark limit on sockbuf determined by net.local.stream.sendspace, default is 8192 bytes (field sockbuf.sb_hiwat). When sendmsg(2) sends large enough data (8K+ that hits this 8192 limit) with control message, sosend_generic will be cutting the message data into separate mbufs based on 'sbspace' (derived from the above-mentioned sb_hiwat limit) with adjustment for control message size as it sees it. This way it tries to make sure this sb_hiwat limit is enforced. However, down on uipc level control message is being further modified in two ways: unp_internalize modifies it into some 'internal' form, also unp_addsockcred function adds another control message when LOCAL_CREDS are requested by client. Both functions only increase control message size beyond its original size (seen by sosend_generic). So that the first final mbuf sent (concatenation of control and data) will always be larger than 'sbspace' limit that sosend_generic was cutting data for. There is also the function sbappendcontrol_locked. It checks the 'sbspace' limit again, and discards the packet when sbspace llimit is exceeded. Its result code is essentially ignored in uipc_send. I believe, sbappendcontrol_locked shouldn't be checking space at all, since packets are expected to be properly sized to begin with. But this won't be the right fix, since sizes would be exceeding the sbspace limit anyway. sosend_default is one level up over uipc level, and it doesn't know what uipc will do with control message. Therefore it can't know what the real adjustment for control message is needed (to properly cut data). It wrongly takes the original control size and this makes the first packet too large and discarded by sbappendcontrol_locked. To solve the problem, I propose to add another function into struct pr_usrreqs: int (*pru_finalizecontrol)(struct socket *so, int flags, struct mbuf **pcontrol); This function will be called from sosend_default and sosend_dgram. uipc_finalizecontrol will do the same that unp_internalize and unp_addsockcred do on uipc level, and it will allow sosend_default to see the final version of the control message, and properly split data into pieces when data is large enough to hit the limit. I felt I better discuss such addition to struct pr_usrreqs, because it might seem like an overkill to add this function just to solve this one local issue. But it seems there is no other solution (other than just ignoring the occasionally larger mbuf size). I can easily make a patch fixing this issue with this new function. Yuri