Date: Thu, 26 Jan 2017 17:41:17 -0800 From: Mark Johnston <markj@FreeBSD.org> To: Gleb Smirnoff <glebius@FreeBSD.org> Cc: jch@FreeBSD.org, hiren@FreeBSD.org, Jason Eggleston <jeggleston@llnw.com>, rrs@FreeBSD.org, jtl@FreeBSD.org, net@FreeBSD.org Subject: Re: listening sockets as non sockets Message-ID: <20170127014117.GA90480@raichu> In-Reply-To: <20170127005251.GM2611@FreeBSD.org> References: <20170127005251.GM2611@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 26, 2017 at 04:52:51PM -0800, Gleb Smirnoff wrote: > Hi guys, > > as some of you already heard, I'm trying to separate listening sockets > into a new file descriptor type. If we look into current struct socket, > we see that some functional fields belong to normal data flow sockets, > and other belong to listening socket. They are never used simultaneously. > Now, if we look at socket API, we see that once a socket underwent transformation > to a listening socket, only 3 regular syscalls now may be called: listen(2), > accept(2) and close(2) and a subset of ioctl() and setsockopt() parameters is > accepted. A listening socket cannot be closed from the protocol side, only from > user side. So, listening socket is so different from a dataflow socket, that > separating them looks architecturally right thing to do. > > The benefits are: > > 1) Nicer code (I hope). > 2) Smaller 'struct socket'. > 3) Having two different locks for socket and solisten, we can try to get rid > of ACCEPT_LOCK global lock. > > The patch is in a very pre-alpha state. It has been run only in my bhyve VM. > > It passes regression tests from tools/regression/sockets and tests/sys, > including the race tests, and including accept filter ones. I haven't yet looked much at the diff, so sorry in advance if this question is inappropriate. One problem I've fought a couple of times (with Infiniband SDP and unix sockets) is a race between accept(2) and a concurrent close of the listening socket. Right now, this problem has to be handled in the domain-specific code (see r303855 for instance), and it's generally awkward to do so. Does your work address this intrinsic race in any way? FWIW, I have a basic test case for unix sockets here, though I believe it's been incorporated into stress2: https://people.freebsd.org/~markj/unix_socket_detach.c > > For TCP it passes basic functionality testing, but likely there are still races > remaining after ACCEPT_LOCK removal. > > For SCTP the patch is unfinished yet. The tricky thing with SCTP is that it > can un-listen a listening socket back to normal socket, doing listen(fd, 0) > on it. My patch has API for that I started working on SCTP, but temporarily > put this problem aside. It looks solvable, but I don't know yet how to test > it. Better first see results with TCP. > > I've put current snapshot to Phab, so that you can view it there. The snap > patch is also attached to this email. > > https://reviews.freebsd.org/D9356 > > At this moment I'd like to start doing some testing (and doing polishing > in parallel), and here I seek for your help. Those, who run FreeBSD at > very high connection rates and observe contention on the accept global > mutex, anybody willing to collaborate with me on this?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170127014117.GA90480>