From owner-freebsd-net@FreeBSD.ORG Thu Jan 23 18:02:13 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 79AFD4A7 for ; Thu, 23 Jan 2014 18:02:13 +0000 (UTC) Received: from mail-we0-x232.google.com (mail-we0-x232.google.com [IPv6:2a00:1450:400c:c03::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 15D1518F6 for ; Thu, 23 Jan 2014 18:02:12 +0000 (UTC) Received: by mail-we0-f178.google.com with SMTP id t60so1593516wes.9 for ; Thu, 23 Jan 2014 10:02:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=iBDqdSdhrPRmwhu9ukg94xiHN9oC+sJAqYPZlwDMxjk=; b=AtR/YkI2URsRK8pGlMgH4kTgopplCIv//xatlLdt3xmGk3NEJ7gQNN8GFx/5Ema6Ba 0KSQIuKVZkHKa65IEB6HN2v7D3aRzpiaU/9exBBmE4ZH2vk+0YfBQOJmBOfkoUZklaXw S+uTSBEmSSz/7A/r/LY/0DeGQDWFhp/rSNY23T6iBDnEtG/VLMwv3/vYjTXN+5Hk04gD OFqYl369J94AuJUwnZRYOSysmXtj49LCYZm42MdSwFX5y5CEf8fNOxcnXRJcOqU7LJt/ ewPefkpls2NTgMSdK4TNOM+rqbruYYq+cj/cejiBH9hEkFOYdAuqXzobvB5DDzlwoNJA gdVA== MIME-Version: 1.0 X-Received: by 10.194.219.132 with SMTP id po4mr7615856wjc.7.1390500131538; Thu, 23 Jan 2014 10:02:11 -0800 (PST) Sender: asomers@gmail.com Received: by 10.194.22.35 with HTTP; Thu, 23 Jan 2014 10:02:11 -0800 (PST) Date: Thu, 23 Jan 2014 11:02:11 -0700 X-Google-Sender-Auth: bNyZH3Yn41pAFb3ZKyZUZfalKIg Message-ID: Subject: kern/185813: SOCK_SEQPACKET AF_UNIX sockets with asymmetrical buffers drop packets From: Alan Somers To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Jan 2014 18:02:13 -0000 There is a buffer space calculation bug in the send path for SOCK_SEQPACKET AF_UNIX sockets. The result is that, if the sending and receiving buffer sizes are different, the kernel will drop messages and leak mbufs. A more detailed description is available in the PR. The labyrinthine nature of the networking code makes it difficult to directly fix the space calculation. It's especially hard due to the optimization that AF_UNIX sockets have only a single socket buffer. As implemented, they store data in the receiving sockbuf, but use the transmitting sockbuf for space calculations. That's even true of SOCK_STREAM sockets. They only work due to an accident; they don't end up doing the same space calculation that trips up SOCK_SEQPACKET sockets. Instead, I propose modifying the kernel to force an AF_UNIX socket pair's buffers to always have the same size. That is, if you call setsockopt(s, SOL_SOCKET, SO_SNDBUF, whatever, whatever), the kernel will adjust both s's send buffer and the connected socket's receive buffer. This solution also solves another annoying problem: currently there is no way for a program to effectively change the size of its receiving buffers. If you call setsockopt(s, SOL_SOCKET, SO_RCVBUF, whatever, whatever) on an AF_UNIX socket, it will have no effect on how packets are actually handled. The attached patch implements my suggestion for setsockopt. It's obviously not perfect; it doesn't handle the case where you call setsockopt() before connect() and it introduces an unfortunate #include, but it's a working proof of concept. With this patch, the recently added ATF test case sys/kern/unix_seqpacket_test:pipe_simulator_128k_8k passes. Does this look like the correct approach? Index: uipc_socket.c =================================================================== --- uipc_socket.c (revision 261055) +++ uipc_socket.c (working copy) @@ -133,6 +133,8 @@ #include #include #include +#include +#include #include #include #include @@ -2382,6 +2384,8 @@ int sosetopt(struct socket *so, struct sockopt *sopt) { + struct socket* so2; + struct unpcb *unpcb, *unpcb2; int error, optval; struct linger l; struct timeval tv; @@ -2503,6 +2507,32 @@ } (sopt->sopt_name == SO_SNDBUF ? &so->so_snd : &so->so_rcv)->sb_flags &= ~SB_AUTOSIZE; + if (so->so_proto->pr_domain->dom_family != + PF_LOCAL || + so->so_type != SOCK_SEQPACKET) + break; + /* + * For unix domain seqpacket sockets, we set the + * bufsize on both ends of the socket. PR + * kern/185813 + */ + unpcb = (struct unpcb*)(so->so_pcb); + if (NULL == unpcb) + break; /* Shouldn't ever happen */ + unpcb2 = unpcb->unp_conn; + if (NULL == unpcb2) + break; /* For unconnected sockets */ + so2 = unpcb2->unp_socket; + if (NULL == so2) + break; /* Shouldn't ever happen? */ + if (sbreserve(sopt->sopt_name == SO_SNDBUF ? + &so2->so_rcv : &so2->so_snd, (u_long)optval, + so, curthread) == 0) { + error = ENOBUFS; + goto bad; + } + (sopt->sopt_name == SO_SNDBUF ? &so2->so_rcv : + &so2->so_snd)->sb_flags &= ~SB_AUTOSIZE; break; /* -Alan