From owner-freebsd-net@FreeBSD.ORG Mon Feb 17 22:10:31 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DBE56FB2; Mon, 17 Feb 2014 22:10:30 +0000 (UTC) Received: from mail-we0-x236.google.com (mail-we0-x236.google.com [IPv6:2a00:1450:400c:c03::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 53F0516D7; Mon, 17 Feb 2014 22:10:30 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id u57so11240983wes.27 for ; Mon, 17 Feb 2014 14:10:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=+EnKCcaprQTWwCJS0KnGsS1Tigyr0zzZaJd7WPy4vqo=; b=a/kmeua1hgCbuQDIkcCZ86c97yTzW7JoNgMiovd7cPYW88IuWOfCmTy6f845b8+Y59 6J/N13KxNXWycaGUtMSpxE/HcSVyFWmoMQAOQzcx0ce6C3tIEDaOwDjCK//VhqAC2ipp IUUGP4eIp1N2R/FsLFW+wcYp4BjT74/KX3+1fR2tCrIwZ4BOHI2Qb3hARavCdcgS9knV 6be2rPu7gVTMoM1jzuzfrUrRKKk7Ua+W0PifePEHM7qC7D4WstJ3dBBC/+CrJixhB5Ef pkPkUpCuT3KxXRB/yjyW5DGBR0ptXNPw7aOXST1gU5lVTaztLTrE+NVpR+HVX9sDwMsV Snuw== MIME-Version: 1.0 X-Received: by 10.180.97.37 with SMTP id dx5mr14950573wib.53.1392675028644; Mon, 17 Feb 2014 14:10:28 -0800 (PST) Sender: asomers@gmail.com Received: by 10.194.168.197 with HTTP; Mon, 17 Feb 2014 14:10:28 -0800 (PST) Date: Mon, 17 Feb 2014 15:10:28 -0700 X-Google-Sender-Auth: OnATy8Tj-dDU94Wm2tGHj6gy2ms Message-ID: Subject: kern/185812: send(2) on a UNIX domain SEQPACKET socket returns EMSGSIZE instead of EAGAIN From: Alan Somers To: FreeBSD Net , rwatson@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Feb 2014 22:10:31 -0000 SOCK_SEQPACKET Unix domain sockets don't work on FreeBSD. kern/185812 is one of the reasons. When send(2) ought to block on a seqpacket socket, it returns EMSGSIZE instead. If the socket is nonblocking, send(2) returns EMSGSIZE instead of EAGAIN. The problem can be demonstrated on FreeBSD 10 or head by the ATF testcase sys/kern/unix_seqpacket_test:eagain_8k_8k. The problem dates to an old hack. It's at least as old as 4.4BSD Lite. When you write to a unix domain socket, the data goes directly to the receiving socket's sockbuf, bypassing the sending socket's sockbuf.. However, sosend_generic doesn't know anything about Unix domain sockets, and it doesn't know anything about the receiving socket. Without some form of backpressure, sosend_generic would never block. So, uipc_send updates the _sending_ sockbuf's sb_hiwat to account for whatever it wrote to the _receiving_ sockbuf. (For those not in the know, sb_hiwat is the maximum allowed amount of data in the buffer.) The next time that sosend_generic gets called, it sees that the sending sockbuf is empty, but it has a lower maximum size than before. If the maximum size is 0, sosend_generic will block. This hack worked fine for SOCK_STREAM sockets, but it breaks SOCK_SEQPACKET sockets, since the latter consist of messages that must be sent atomically. When sb_hiwat is too low to fit an entire message, sosend_generic will return EMSGSIZE instead of blocking or returning EAGAIN. Fortunately, we have a template for how to fix this bug. DragonFlyBSD fixed it back in 2008. Instead of applying backpressure through sb_hiwat, it uses a new sockbuf flag called SSB_STOP. When the receiving sockbuf runs out of space, uipc_send sets SSB_STOP on the sending sockbuf. Then, sosend_generic will block (or return EAGAIN) on the next attempt to write. This solution is very clean and simple. It might also be slightly faster than the legacy method, because it eliminates the need to call chgsbsize() on every send() and recv(). I am aware of one drawback: since ssb_space() will only ever return 0 or ssb_hiwat, sosend_generic will allow the sockbuf to exceed its nominal maximum size by at most one packet of size less than ssb_hiwat. I don't think that's a serious problem. In fact, I'm not even positive that FreeBSD guarantees a socket will always stay within its nominal size limit. Does this solution sound acceptable in FreeBSD? Is there any reason that I shouldn't port it? Note that DragonFly long ago refactored struct sockbuf into two separate structures: struct sockbuf and struct signalsockbuf. I won't make that change as part of the port. In case you're wondering, NetBSD 6.0 suffers from the same bug, OpenBSD 5.4 doesn't appear to support SOCK_SEQPACKET unix domain sockets, and Linux 3.2.0 does not suffer. The relevant commit in DragonFlyBSD: https://github.com/DragonFlyBSD/DragonFlyBSD/commit/3a6117bbe0ed6a87605c1e43e12a1438d8844380 -Alan