From owner-freebsd-net Thu Mar 8 14:19:12 2001 Delivered-To: freebsd-net@freebsd.org Received: from prism.flugsvamp.com (cb58709-a.mdsn1.wi.home.com [24.17.241.9]) by hub.freebsd.org (Postfix) with ESMTP id 788A637B718 for ; Thu, 8 Mar 2001 14:19:06 -0800 (PST) (envelope-from jlemon@flugsvamp.com) Received: (from jlemon@localhost) by prism.flugsvamp.com (8.11.0/8.11.0) id f28MGB291175; Thu, 8 Mar 2001 16:16:11 -0600 (CST) (envelope-from jlemon) Date: Thu, 8 Mar 2001 16:16:11 -0600 From: Jonathan Lemon To: Wietse Venema Cc: Jonathan Lemon , itojun@iijlab.net, Arjan.deVet@adv.iae.nl, net@freebsd.org, postfix-users@postfix.org Subject: Re: [itojun@iijlab.net: accept(2) behavior with tcp RST right after handshake] Message-ID: <20010308161611.B78851@prism.flugsvamp.com> References: <20010308095759.S41963@prism.flugsvamp.com> <20010308180048.CC09DBC06D@spike.porcupine.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i In-Reply-To: <20010308180048.CC09DBC06D@spike.porcupine.org> Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, Mar 08, 2001 at 01:00:48PM -0500, Wietse Venema wrote: > Jonathan Lemon: > > On Thu, Mar 08, 2001 at 10:38:17AM -0500, Wietse Venema wrote: > > > If the result of connect() write() close() depends on whether > > > accept() happens after or before close(), then the behavior is > > > broken. The client has received a successful return from write() > > > and close(). The system is not supposed to lose the data, period. > > > > What you seem to be missing here is that the behavior described > > above is ONLY specific to UNIX-DOMAIN sockets. The description > > above is generally (but not always) true for the TCP/IP protocol. > > The problem is observed with UNIX-domain sockets. > > > Data CAN be lost if the TCP connection is RST. It has nothing to > > do with the ordering of accept() with respect to close(). > > Please educate me: how would RST come into this discussion at all? > The client does connect() write() close(), there is no forced > connection termination involved at all. Under normal circumstances, a connect(), write(), close() call should work. However, the code that was added was to handle the abnormal cases from the server's point of view. As you noted, this happened to break for unix-domain sockets under 4.2-stable, because of the following kernel semantics bug: + with unix-domain sockets, the connection is marked as DISCONNECTED as soon as the final close() is performed. + with TCP/IP sockets, a connection is marked "DISCONNECTING" on the final client close, but is NOT actually closed (marked as DISCONNECTED) until the server is notified that client's TCP/IP endpoint is gone. What we are trying to fix here is when the server, for some reason, happens to see the client forcibly tear down the endpoint before it can get around to to accepting the connection. From the server's point of view: + TCP/IP handshake from client, allocate protocol control blocks + receive data from client + client resets connection, pcb is destroyed Exactly why the client resets the connection isn't my concern at the moment. Some stacks may place a timeout on the FIN_WAIT state, and forcibly reset the reset the connection when the timer expires. Alternatively, the client may crash, and then RST in response to an ACK transmitted by the server. Or the other end may have set SO_LINGER, which will cause close() to send a RST. The unix-domain bug is because we were treating sockets in the DISCONNECTED state identically across all protocols, which turns out not to be the case. As for any data that already exists in the socket buffer on the server when the connection is aborted, I believe that the correct thing to do is discard it. This is the historical precedent, and is supported by the current standards. Below is a patch that will fix the behavior for unix-domain sockets. -- Jonathan Index: kern/uipc_socket.c =================================================================== RCS file: /ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.68.2.13 diff -u -r1.68.2.13 uipc_socket.c --- kern/uipc_socket.c 2001/02/26 04:23:16 1.68.2.13 +++ kern/uipc_socket.c 2001/03/08 02:34:00 @@ -360,10 +360,7 @@ if ((so->so_state & SS_NOFDREF) == 0) panic("soaccept: !NOFDREF"); so->so_state &= ~SS_NOFDREF; - if ((so->so_state & SS_ISDISCONNECTED) == 0) - error = (*so->so_proto->pr_usrreqs->pru_accept)(so, nam); - else - error = ECONNABORTED; + error = (*so->so_proto->pr_usrreqs->pru_accept)(so, nam); splx(s); return (error); } Index: netinet/tcp_usrreq.c =================================================================== RCS file: /ncvs/src/sys/netinet/tcp_usrreq.c,v retrieving revision 1.51 diff -u -r1.51 tcp_usrreq.c --- netinet/tcp_usrreq.c 2000/01/09 19:17:28 1.51 +++ netinet/tcp_usrreq.c 2001/03/08 16:21:28 @@ -417,6 +417,10 @@ struct inpcb *inp = sotoinpcb(so); struct tcpcb *tp; + if (so->so_state & SS_ISDISCONNECTED) { + error = ECONNABORTED; + goto out; + } COMMON_START(); in_setpeeraddr(so, nam); COMMON_END(PRU_ACCEPT); @@ -431,6 +435,10 @@ struct inpcb *inp = sotoinpcb(so); struct tcpcb *tp; + if (so->so_state & SS_ISDISCONNECTED) { + error = ECONNABORTED; + goto out; + } COMMON_START(); in6_mapped_peeraddr(so, nam); COMMON_END(PRU_ACCEPT); To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message