From owner-freebsd-net@freebsd.org Fri Jan 27 01:41:25 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E10B7CC375D for ; Fri, 27 Jan 2017 01:41:25 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id BF48E853 for ; Fri, 27 Jan 2017 01:41:25 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id BE9ACCC375C; Fri, 27 Jan 2017 01:41:25 +0000 (UTC) Delivered-To: net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BE3C4CC3759 for ; Fri, 27 Jan 2017 01:41:25 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-pf0-x244.google.com (mail-pf0-x244.google.com [IPv6:2607:f8b0:400e:c00::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8EF2E852; Fri, 27 Jan 2017 01:41:25 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: by mail-pf0-x244.google.com with SMTP id f144so17492452pfa.2; Thu, 26 Jan 2017 17:41:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=MMW0danuRY04yQzAfkO7vY/dZLErN7DGuEQJFGVCLTI=; b=G50230Z6Le/HlunfIon2+jOgkBBXFSnIFcJREJnXq1wl8JUWyhZAS5G5Ld4eB5CFOL O9Qqt67OXjv+UjLr6TaWM4TavhtU+6idsICuOPyr4DEp57vGIxtgVYOb0ylWxTI6BTiD KwI3LLGB3Tb25jQGzuRlWMErxS2s7BhBlUE8URYaCis8pK+LBTWbx2cHuJ/uPXNhYjf6 whm9Jhq2zIJ4BQ52/SLlB1GuA4EQqyHK2VDhPaHAgEeWiF6RYXYWTvfX8KCgHKGv6hU/ kNzFeKK0qf/VCBxnmfblBw9k/WTMaH7dEZuMZojRHDwzB9clzpU7lFPTtVZm+qCQLSDH l+ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=MMW0danuRY04yQzAfkO7vY/dZLErN7DGuEQJFGVCLTI=; b=TYhdoEclhNSfcyD6jMCOrP41mvA7v99yAVJRFtQydV/heUu9jH8S6dF99v6gBLWe7L 586ROeiHd2TvJLOIIzUWpgDrElWuzYgk+U6/vQvcvXm5zguv7wpPCPoEvOYLINdXgl2L cSchjRjl+EjB8MbYcZi7ts3C6XiUacyLBQyUOygBc3jdR/iBFmCmT0Q8mRWInrvv4FKk 6p0zvlmhehomjvwgsr7I8u2tL7/Nrhmkn3IkMuCo0OlAmrgOOqqTKPd3uCFrI6kXPaCI 5DWugwKvTHg/vXd0/6FjkkvP5VgS6oyho4UiKa+JrGs4xfPoKwLZ1ZvEGKFspG6gfaH8 Zq5A== X-Gm-Message-State: AIkVDXLQYJ6K5j5i28mlJrzfdnUi61jcuO8PMIjKTtL2dZs4ZGAxqRd8smnkJkRv+OtV6g== X-Received: by 10.84.216.24 with SMTP id m24mr8618575pli.22.1485481285026; Thu, 26 Jan 2017 17:41:25 -0800 (PST) Received: from raichu ([2604:4080:1102:0:ca60:ff:fe9d:3963]) by smtp.gmail.com with ESMTPSA id j185sm6132098pgd.35.2017.01.26.17.41.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 17:41:24 -0800 (PST) Sender: Mark Johnston Date: Thu, 26 Jan 2017 17:41:17 -0800 From: Mark Johnston To: Gleb Smirnoff Cc: jch@FreeBSD.org, hiren@FreeBSD.org, Jason Eggleston , rrs@FreeBSD.org, jtl@FreeBSD.org, net@FreeBSD.org Subject: Re: listening sockets as non sockets Message-ID: <20170127014117.GA90480@raichu> References: <20170127005251.GM2611@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170127005251.GM2611@FreeBSD.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jan 2017 01:41:26 -0000 On Thu, Jan 26, 2017 at 04:52:51PM -0800, Gleb Smirnoff wrote: > Hi guys, > > as some of you already heard, I'm trying to separate listening sockets > into a new file descriptor type. If we look into current struct socket, > we see that some functional fields belong to normal data flow sockets, > and other belong to listening socket. They are never used simultaneously. > Now, if we look at socket API, we see that once a socket underwent transformation > to a listening socket, only 3 regular syscalls now may be called: listen(2), > accept(2) and close(2) and a subset of ioctl() and setsockopt() parameters is > accepted. A listening socket cannot be closed from the protocol side, only from > user side. So, listening socket is so different from a dataflow socket, that > separating them looks architecturally right thing to do. > > The benefits are: > > 1) Nicer code (I hope). > 2) Smaller 'struct socket'. > 3) Having two different locks for socket and solisten, we can try to get rid > of ACCEPT_LOCK global lock. > > The patch is in a very pre-alpha state. It has been run only in my bhyve VM. > > It passes regression tests from tools/regression/sockets and tests/sys, > including the race tests, and including accept filter ones. I haven't yet looked much at the diff, so sorry in advance if this question is inappropriate. One problem I've fought a couple of times (with Infiniband SDP and unix sockets) is a race between accept(2) and a concurrent close of the listening socket. Right now, this problem has to be handled in the domain-specific code (see r303855 for instance), and it's generally awkward to do so. Does your work address this intrinsic race in any way? FWIW, I have a basic test case for unix sockets here, though I believe it's been incorporated into stress2: https://people.freebsd.org/~markj/unix_socket_detach.c > > For TCP it passes basic functionality testing, but likely there are still races > remaining after ACCEPT_LOCK removal. > > For SCTP the patch is unfinished yet. The tricky thing with SCTP is that it > can un-listen a listening socket back to normal socket, doing listen(fd, 0) > on it. My patch has API for that I started working on SCTP, but temporarily > put this problem aside. It looks solvable, but I don't know yet how to test > it. Better first see results with TCP. > > I've put current snapshot to Phab, so that you can view it there. The snap > patch is also attached to this email. > > https://reviews.freebsd.org/D9356 > > At this moment I'd like to start doing some testing (and doing polishing > in parallel), and here I seek for your help. Those, who run FreeBSD at > very high connection rates and observe contention on the accept global > mutex, anybody willing to collaborate with me on this?