From owner-freebsd-net@FreeBSD.ORG Mon Nov 15 22:57:48 2004 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2E9A516A4CE; Mon, 15 Nov 2004 22:57:48 +0000 (GMT) Received: from mail.vicor-nb.com (bigwoop.vicor-nb.com [208.206.78.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id D1D7F43D46; Mon, 15 Nov 2004 22:57:46 +0000 (GMT) (envelope-from julian@elischer.org) Received: from elischer.org (julian.vicor-nb.com [208.206.78.97]) by mail.vicor-nb.com (Postfix) with ESMTP id 994AC7A424; Mon, 15 Nov 2004 14:57:46 -0800 (PST) Message-ID: <4199346A.3040908@elischer.org> Date: Mon, 15 Nov 2004 14:57:46 -0800 From: Julian Elischer User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030516 X-Accept-Language: en, hu MIME-Version: 1.0 To: Gleb Smirnoff References: <20041115104331.GA93477@cell.sick.ru> In-Reply-To: <20041115104331.GA93477@cell.sick.ru> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: maxim@freebsd.org cc: rwatson@freebsd.org cc: net@freebsd.org Subject: Re: divert(4) socket isn't connection oriented X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Nov 2004 22:57:48 -0000 Gleb Smirnoff wrote: > Hi! > > I've spent several days digging in interaction between divert >protocol and ng_ksocket. I've find some oddities in there. > > Look at div_output(), it tells incoming packet from outgoing >by presence of sockaddr_in structure. Depending on it packet >is passed either to ip_input() or ip_output(). This sockaddr_in was >previuosly supplied by divert interface to application at the end >of divert_packet() with help of sbappendaddr_locked(). > > Look at ng_ksocket_incoming2(), near 'pru_soreceive'. When >ng_ksocket takes data from socket it saves supplied sockaddr in >m_tag attached to packet. After packet travel thru netgraph(4) it >may return to ng_ksocket in ng_ksocket_rcvdata() and this sockaddr >will be used in call to sosend(). > > It is important that ng_ksocket does not save sockaddr if socket is >connected (see ng_ksocket_incoming2(), near 'pru_soreceive'). And this >is correct! If a generic socket is connected, all data must be sent to >connection destination. The problem is that divert(4) socket is >always marked as connected (see end of div_attach()), and thus >ng_ksocket_rcvdata() does not supply sockaddr in call to sosend(). So >div_output() _always_ sends data to ip_output()! This is strange and odd >but working - packets flow in both directions. This setup is working: > >/usr/sbin/ngctl -f- <<-SEQ > mkpeer echo dummy dummy > name .:dummy echo_div > mkpeer echo_div: ksocket echo inet/raw/divert > name echo_div:echo div_sock > rmhook dummy > msg div_sock: bind inet/0.0.0.0:8888 >SEQ >ipfw add 1 divert 8888 all from any to any via fxp0 in >ping -c 1 www.com > It might be worthwhile looking at a direct netgraph/ipfw interface :-) ipfw add 1 netgraph any from me to you.com 666 in ipfw add 1 netgraph any from me to you.com 666 out Creating the netgraph entry would create 2 hooks called '1-in and 1-out' :-) (other syntaxes considerred) > >Since it is working, it was not noticed quickly. Real problems occur when >a multicast packet comes on interface: it is diverted to ng_ksocket, returned >and div_output() sends it to ip_output(). In ip_output() it is ip_mloopback()ed >and if_simloop()ed. A copy of packet enters divert socket, duplicated... a >forever loop and total freeze. > >Removing 'always connected status' from divert sockets fixes the problem >and incoming packets go into ip_input(), not ip_output(). Please review attached >patch. It: > >- removes SS_ISCONNECTED from so->so_state > to me this makes sense as divert is not conection based.. >- removes pru_disconnect method, since it won't be called >- removes pru_abort method, since it must not be called on divert > socket. It was used indirectly via disconnect method. > seems fair. Other possibilities may include converting the tags to something else so that data is not lost but just moved aside (but still avaliable). Making divert a non connected protocol however is ok.. I think it better fits it usage. > >With this patch codepath of packets is correct and incoming packets do >not enter ip_output() anymore. > >An alternative may be adding a kludge in ng_ksocket_incoming2(), so that sockaddr >is saved always if socket is a divert socket. I don't like it. > >Awaiting for your feedback! > > If I did it again I think I'd suggest that divert not be a part of the INET family.. It was easiest tat that time to make it so.. Archie, do you remember why we decided this? > > >------------------------------------------------------------------------ > >Index: ip_divert.c >=================================================================== >RCS file: /home/ncvs/src/sys/netinet/ip_divert.c,v >retrieving revision 1.109 >diff -u -r1.109 ip_divert.c >--- ip_divert.c 12 Nov 2004 22:17:42 -0000 1.109 >+++ ip_divert.c 15 Nov 2004 10:14:37 -0000 >@@ -415,12 +415,7 @@ > inp->inp_ip_p = proto; > inp->inp_vflag |= INP_IPV4; > inp->inp_flags |= INP_HDRINCL; >- /* The socket is always "connected" because >- we always know "where" to send the packet */ > INP_UNLOCK(inp); >- SOCK_LOCK(so); >- so->so_state |= SS_ISCONNECTED; >- SOCK_UNLOCK(so); > return 0; > } > >@@ -442,32 +437,6 @@ > } > > static int >-div_abort(struct socket *so) >-{ >- struct inpcb *inp; >- >- INP_INFO_WLOCK(&divcbinfo); >- inp = sotoinpcb(so); >- if (inp == 0) { >- INP_INFO_WUNLOCK(&divcbinfo); >- return EINVAL; /* ??? possible? panic instead? */ >- } >- INP_LOCK(inp); >- soisdisconnected(so); >- in_pcbdetach(inp); >- INP_INFO_WUNLOCK(&divcbinfo); >- return 0; >-} >- >-static int >-div_disconnect(struct socket *so) >-{ >- if ((so->so_state & SS_ISCONNECTED) == 0) >- return ENOTCONN; >- return div_abort(so); >-} >- >-static int > div_bind(struct socket *so, struct sockaddr *nam, struct thread *td) > { > struct inpcb *inp; >@@ -662,12 +631,10 @@ > #endif > > struct pr_usrreqs div_usrreqs = { >- .pru_abort = div_abort, > .pru_attach = div_attach, > .pru_bind = div_bind, > .pru_control = in_control, > .pru_detach = div_detach, >- .pru_disconnect = div_disconnect, > .pru_peeraddr = div_peeraddr, > .pru_send = div_send, > .pru_shutdown = div_shutdown, > >