Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 9 Dec 2002 10:37:18 +0200 (EET)
From:      Diomidis Spinellis <dds@aueb.gr>
To:        FreeBSD-gnats-submit@FreeBSD.org
Cc:        Isidor Kouvelas <kouvelas@cisco.com>, Orion Hodson <hodson@aciri.org>, Archie Cobbs <archie@dellroad.org>, Stephen Casner <casner@packetdesign.com>
Subject:   kern/46116: EHOSTDOWN when ARP replies get drowned by data following ARP pre-requests
Message-ID:  <200212090837.gB98bIH6049775@istlab.dmst.aueb.gr>

next in thread | raw e-mail | index | archive | help

>Number:         46116
>Category:       kern
>Synopsis:       EHOSTDOWN when ARP replies get drowned by data following ARP pre-requests
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Dec 09 00:40:03 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     Diomidis Spinellis
>Release:        FreeBSD 4.7-RELEASE-p2 i386
>Organization:
Athens University of Economics and Business
>Environment:
System: FreeBSD spiti 4.7-RELEASE-p2 FreeBSD 4.7-RELEASE-p2 #7: Thu Dec 5 02:59:36 EET 2002 dds@spiti:/vol/obj/vol/src/sys/SPITI i386


>Description:
The change introduced by PR-25517
http://www.freebsd.org/cgi/query-pr.cgi?pr=25517
has arp_rtrequest (netinet/if_ether.c) start making ARP queries 
arpt_maxtries * arpt_down seconds before the ARP entry expiry time
in an effort to avoid the ARP entry's expiry.  In such a case the ARP reply
can get drowned by the actual data the host will send following the ARP
request.  The original change has the la_asked counter get incremented
after each pre-request.  As a result, when no reply is received
after all pre-requests are made, the host (unfairly) gets marked as down. 

As an example, a client playing MP3 music from an NFS server will pause 
its playback for 20 seconds (host_down_time) every 20 minutes (max_age).

>How-To-Repeat:
To avoid waiting for 20 minutes and make the packets more observable set:

sysctl net.link.ether.inet.max_age=60
sysctl net.link.ether.inet.host_down_time=5

On a 10Mb/s ethernet have a 150MHz Pentium client running FreeBSD play
192Mb/s MP3 data from an NFS-mounted server file using mpg123.  Monitor
the packet exchange from a third host and install a kernel breakpoint in
in_arpinput to verify that the client does send ARP replies.

The problem will appear as follows:

16:05:02.953682 client.1622700147 > server.nfs: 112 read [|nfs]
16:05:02.955113 arp who-has client (88:2:8:1:3:f6) tell server
// Server drowns the client's reply with data it immediately sends
16:05:02.956720 server.nfs > client.1622700147: reply ok 1472 read (frag 54854:1480@0+)
16:05:02.957972 server > client: (frag 54854:1480@1480+)
// [...] no ARP reply
16:05:08.141564 arp who-has client (88:2:8:1:3:f6) tell server
// [...] no ARP reply
16:05:12.891851 arp who-has client (88:2:8:1:3:f6) tell server
// [...] no ARP reply
16:05:18.006188 arp who-has client (88:2:8:1:3:f6) tell server
// [...] no ARP reply
16:05:23.121545 arp who-has client (88:2:8:1:3:f6) tell server
// [...] no ARP reply
// ARP entry expires and arpt_maxtries is reached
// server logs /kernel: nfsd send error 64

// client stops getting replies from the server:
16:05:26.964593 client.1622700218 > server.nfs: 96 access [|nfs]
16:05:27.014233 client.1622700218 > server.nfs: 96 access [|nfs]
16:05:27.104235 client.1622700218 > server.nfs: 96 access [|nfs]
16:05:27.213220 client.1622700219 > server.nfs: 112 read [|nfs]
16:05:27.274168 client.1622700218 > server.nfs: 96 access [|nfs]
16:05:27.304173 client.1622700219 > server.nfs: 112 read [|nfs]
16:05:27.494200 client.1622700219 > server.nfs: 112 read [|nfs]
16:05:27.604169 client.1622700218 > server.nfs: 96 access [|nfs]
16:05:27.864192 client.1622700219 > server.nfs: 112 read [|nfs]
16:05:28.254128 client.1622700218 > server.nfs: 96 access [|nfs]
16:05:28.594132 client.1622700219 > server.nfs: 112 read [|nfs]
16:05:29.544090 client.1622700218 > server.nfs: 96 access [|nfs]
16:05:30.044069 client.1622700219 > server.nfs: 112 read [|nfs]
16:05:32.113932 client.1622700218 > server.nfs: 96 access [|nfs]
16:05:32.933903 client.1622700219 > server.nfs: 112 read [|nfs]

// After host_down seconds the ARP request-reply succeeds,
// the serves sends the - now mbuf-held - packet, and communication resumes:
16:05:32.935318 arp who-has client (88:2:8:1:3:f6) tell server
16:05:32.935651 arp reply client is-at 0:0:e8:32:81:e8
16:05:32.936735 server > client: (frag 54997:928@7400)
16:05:37.243660 client.1622700218 > server.nfs: 96 access [|nfs]
16:05:37.244381 server.nfs > client.1622700218: reply ok 120 access c 60b870bc
16:05:37.245177 client.1622700220 > server.nfs: 96 access [|nfs]
16:05:37.245883 server.nfs > client.1622700220: reply ok 120 access c e523ff20
16:05:38.703607 client.1622700219 > server.nfs: 112 read [|nfs]
16:05:38.707055 server.nfs > client.1622700219: reply ok 1472 read (frag 55020:1480@0+)
16:05:38.708308 server > client: (frag 55020:1480@1480+)
16:05:38.709566 server > client: (frag 55020:1480@2960+)
16:05:38.710804 server > client: (frag 55020:1480@4440+)

Repeating the above procedure with a breakpoint in in_arpinput will 
prove that the client does indeed send a reply, bu will not
result in a problem, since the network is clear by the time you type
continue in the kernel debugger.

>Fix:

The problem can be fixed in three different ways:
1. Remove the ARP pre-request code
2. Hold the packet in an mbuf when performing a pre-request
3. Do not mark a host that does not reply to ARP pre-requests as down

Solutions 1 and 2 can affect performance, so the third solution was adopted.
An additional counter la_preask now counts queries before the ARP entry expiry
time.  This counter is not used for marking a host down; the original
counter la_asked is now only used to count retries for expired ARP entries
and mark a host down if needed.  The fix also corrects the data type and
comment used for la_asked.

*** if_ether.c	2002/12/08 14:31:40	1.1
--- if_ether.c	2002/12/08 16:11:26
***************
*** 95,101 ****
  	LIST_ENTRY(llinfo_arp) la_le;
  	struct	rtentry *la_rt;
  	struct	mbuf *la_hold;		/* last packet until resolved/timeout */
! 	long	la_asked;		/* last time we QUERIED for this addr */
  #define la_timer la_rt->rt_rmx.rmx_expire /* deletion time in seconds */
  };
  
--- 95,102 ----
  	LIST_ENTRY(llinfo_arp) la_le;
  	struct	rtentry *la_rt;
  	struct	mbuf *la_hold;		/* last packet until resolved/timeout */
! 	int	la_preask;		/* #times we preQUERIED for this addr */
! 	int	la_asked;		/* #times we QUERIED for this addr */
  #define la_timer la_rt->rt_rmx.rmx_expire /* deletion time in seconds */
  };
  
***************
*** 438,450 ****
  		 * arpt_down interval.
  		 */
  		if ((rt->rt_expire != 0) &&
! 		    (time_second + (arp_maxtries - la->la_asked) * arpt_down >
  		     rt->rt_expire)) {
  			arprequest(ifp,
  				   &SIN(rt->rt_ifa->ifa_addr)->sin_addr,
  				   &SIN(dst)->sin_addr,
  				   IF_LLADDR(ifp));
! 			la->la_asked++;
  		} 
  
  		bcopy(LLADDR(sdl), desten, sdl->sdl_alen);
--- 439,451 ----
  		 * arpt_down interval.
  		 */
  		if ((rt->rt_expire != 0) &&
! 		    (time_second + (arp_maxtries - la->la_preask) * arpt_down >
  		     rt->rt_expire)) {
  			arprequest(ifp,
  				   &SIN(rt->rt_ifa->ifa_addr)->sin_addr,
  				   &SIN(dst)->sin_addr,
  				   IF_LLADDR(ifp));
! 			la->la_preask++;
  		} 
  
  		bcopy(LLADDR(sdl), desten, sdl->sdl_alen);
***************
*** 480,486 ****
  			else {
  				rt->rt_flags |= RTF_REJECT;
  				rt->rt_expire += arpt_down;
! 				la->la_asked = 0;
  			}
  
  		}
--- 481,487 ----
  			else {
  				rt->rt_flags |= RTF_REJECT;
  				rt->rt_expire += arpt_down;
! 				la->la_preask = la->la_asked = 0;
  			}
  
  		}
***************
*** 734,740 ****
  		if (rt->rt_expire)
  			rt->rt_expire = time_second + arpt_keep;
  		rt->rt_flags &= ~RTF_REJECT;
! 		la->la_asked = 0;
  		if (la->la_hold) {
  			(*ifp->if_output)(ifp, la->la_hold,
  				rt_key(rt), rt);
--- 735,741 ----
  		if (rt->rt_expire)
  			rt->rt_expire = time_second + arpt_keep;
  		rt->rt_flags &= ~RTF_REJECT;
! 		la->la_preask = la->la_asked = 0;
  		if (la->la_hold) {
  			(*ifp->if_output)(ifp, la->la_hold,
  				rt_key(rt), rt);
***************
*** 858,864 ****
  	if (rt->rt_refcnt > 0 && (sdl = SDL(rt->rt_gateway)) &&
  	    sdl->sdl_family == AF_LINK) {
  		sdl->sdl_alen = 0;
! 		la->la_asked = 0;
  		rt->rt_flags &= ~RTF_REJECT;
  		return;
  	}
--- 859,865 ----
  	if (rt->rt_refcnt > 0 && (sdl = SDL(rt->rt_gateway)) &&
  	    sdl->sdl_family == AF_LINK) {
  		sdl->sdl_alen = 0;
! 		la->la_preask = la->la_asked = 0;
  		rt->rt_flags &= ~RTF_REJECT;
  		return;
  	}



The following packets demonstrate the fixed behavior:

19:05:20.022221 arp who-has client (88:2:8:1:3:f6) tell server
19:05:20.023796 server.nfs > client.1622712534: reply ok 1472 read (frag 10734:1480@0+)
19:05:20.025042 server > client: (frag 10734:1480@1480+)
// [...] no ARP reply
19:05:24.844843 arp who-has client (88:2:8:1:3:f6) tell server
// [...] no ARP reply
19:05:29.959532 arp who-has client (88:2:8:1:3:f6) tell server
// [...] no ARP reply
19:05:35.072951 arp who-has client (88:2:8:1:3:f6) tell server
// [...] no ARP reply
19:05:39.824958 arp who-has client (88:2:8:1:3:f6) tell server
// [...] no ARP reply

// ARP entry has expired; server now holds the packet and WAITS for a reply
19:05:43.988615 arp who-has client (88:2:8:1:3:f6) tell server
19:05:43.988966 arp reply client is-at 0:0:e8:32:81:e8

// Reply-received, communication resumes without interruption:
19:05:43.990042 server > client: (frag 10827:928@7400)
19:05:44.095164 client.1622712606 > server.nfs: 112 read [|nfs]
19:05:44.097995 server.nfs > client.1622712606: reply ok 1472 read (frag 10828:1480@0+)
19:05:44.099237 server > client: (frag 10828:1480@1480+)
>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200212090837.gB98bIH6049775>