From owner-freebsd-net Thu Feb 22 18:49:59 2001 Delivered-To: freebsd-net@freebsd.org Received: from freesbee.wheel.dk (freesbee.wheel.dk [193.162.159.97]) by hub.freebsd.org (Postfix) with ESMTP id 88FB337B491 for ; Thu, 22 Feb 2001 18:49:53 -0800 (PST) (envelope-from jesper@skriver.dk) Received: by freesbee.wheel.dk (Postfix, from userid 1001) id CA9163E69; Fri, 23 Feb 2001 03:49:52 +0100 (CET) Date: Fri, 23 Feb 2001 03:49:52 +0100 From: Jesper Skriver To: Jonathan Lemon Cc: net@freebsd.org Subject: Re: ICMP unreachables, take II. Message-ID: <20010223034952.A6694@skriver.dk> References: <20010222185412.E5714@prism.flugsvamp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010222185412.E5714@prism.flugsvamp.com>; from jlemon@flugsvamp.com on Thu, Feb 22, 2001 at 06:54:12PM -0600 Sender: owner-freebsd-net@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, Feb 22, 2001 at 06:54:12PM -0600, Jonathan Lemon wrote: I was just about to send a MFC of the current code out for review, will ditch that ... > I recently had a bug report regarding kqueue, where the kevent() call > for a TCP socket would return because so_error was set, but the > connection was still valid. > > The cause of this was because a non-blocking connect() call was made, > and then the socket was monitored for writability. However, an ICMP > error was returned, which eventually (after several retransmits) caused > so_error to be set in tcp_notify() without changing the connection state. > > However, it doesn't really seem reasonable to leave the connection > pending in a SYN_SENT state at this point. From the user's perspective, > the select/kevent call returns, indicating writability, but the next > operation (probably write) would fail, returning the contents of so_error. > > I would propose that instead of this behavior, the connection should > be dropped instead. Makes sense. > Also, RFC 1122 indicates that the application layer SHOULD report > soft errors, and it seems that a half-hearted attempt is made in > tcp_notify(), by calling so{rw}wakeup. > > However, this also seems to be incorrect, since select will not return > and any tests performed on the socket state will show no change, so it > seems that the wakeup calls should be removed. > > Also, (still reading?) while I'm in this bit of code, we currently > react to all ICMP unreachable errors during setup of a connection; > this is incorrect, only port/protocol and administrative icmp subtypes > should be valid, so fix this as well. In this case, return ENETRESET > as the error code to the user layer. Agree, it should be a different from ECONNREFUSED I still think we should react to the following as a minimum - type 3 code 0 net unreachable - type 3 code 1 host unreachable - type 3 code 9 net administrative prohibited - type 3 code 10 host administrative prohibited in addition to - type 3 code 2 protocol unreachable - type 3 code 3 port unreachable The first too, so connections won't wait for timeout if the routers tell you that the net/host is unreachable. > Finally, ICMP errors should never be allowed to kill an existing TCP > connection; if an administrative filter is installed across some existing > flows, then those flows should be allowed to time out per the TCP protocol. What I submitted was what we agreed upon earlier, but the above is fine by me. > Patch attached, please review. See comments inline, I havn't build a kernel with it to verify functionality. > Index: ip_icmp.c > =================================================================== > RCS file: /ncvs/src/sys/netinet/ip_icmp.c,v > retrieving revision 1.52 > diff -u -r1.52 ip_icmp.c > --- ip_icmp.c 2001/02/18 09:34:51 1.52 > +++ ip_icmp.c 2001/02/22 23:48:25 > @@ -315,67 +315,35 @@ > case ICMP_UNREACH: > switch (code) { > case ICMP_UNREACH_NET: > - code = PRC_UNREACH_HOST; > - break; > - > case ICMP_UNREACH_HOST: > - code = PRC_UNREACH_HOST; > - break; These 2 I don't agree upon, see above. > Index: tcp_subr.c > =================================================================== > RCS file: /ncvs/src/sys/netinet/tcp_subr.c,v > retrieving revision 1.91 > diff -u -r1.91 tcp_subr.c > --- tcp_subr.c 2001/02/22 21:23:45 1.91 > +++ tcp_subr.c 2001/02/22 23:43:33 > @@ -134,32 +134,9 @@ > SYSCTL_INT(_net_inet_tcp, OID_AUTO, pcbcount, CTLFLAG_RD, > &tcbinfo.ipi_count, 0, "Number of active PCBs"); > > -/* > - * Treat ICMP unreachables like a TCP RST as required by rfc1122 section 3.2.2.1 > - * > - * Administatively prohibited kill's sessions regardless of > - * their current state, other unreachable by default only kill > - * sessions if they are in SYN-SENT state, this ensure temporary > - * routing problems doesn't kill existing TCP sessions. > - * This can be overridden by icmp_like_rst_syn_sent_only. > - */ > - > -static int icmp_unreach_like_rst = 1; > -SYSCTL_INT(_net_inet_tcp, OID_AUTO, icmp_unreach_like_rst, CTLFLAG_RW, > - &icmp_unreach_like_rst, 0, > - "Treat ICMP unreachable messages like TCP RST, rfc1122 section 3.2.2.1"); > - > -/* > - * Control if ICMP unreachable messages other that administratively prohibited > - * ones will kill sessions not in SYN-SENT state. > - * > - * Has no effect unless icmp_unreach_like_rst is enabled. > - */ > - > -static int icmp_like_rst_syn_sent_only = 1; > -SYSCTL_INT(_net_inet_tcp, OID_AUTO, icmp_like_rst_syn_sent_only, CTLFLAG_RW, > - &icmp_like_rst_syn_sent_only, 0, > - "When icmp_unreach_like_rst is enabled, only act on sessions in SYN-SENT state"); Perhaps you could keep some of the comment ... /* * Treat some ICMP unreachables like a TCP RST as required by * rfc1122 section 3.2.2.1 */ > if (cmd == PRC_QUENCH) > notify = tcp_quench; > - else if ((icmp_unreach_like_rst == 1) && ((cmd == PRC_UNREACH_HOST) || > - (cmd == PRC_UNREACH_ADMIN_PROHIB)) && (ip) && > - ((IP_VHL_HL(ip->ip_vhl) << 2) == sizeof(struct ip))) { Sure we'll not try to read off the end of the recieved packet, when we remove the check for the header length. I put it there as a extra check against "attackers" sending us malformed ICMP messages with only part of the attached IP header, or even without it. /Jesper -- Jesper Skriver, jesper(at)skriver(dot)dk - CCIE #5456 Work: Network manager @ AS3292 (Tele Danmark DataNetworks) Private: FreeBSD committer @ AS2109 (A much smaller network ;-) One Unix to rule them all, One Resolver to find them, One IP to bring them all and in the zone to bind them. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-net" in the body of the message