Date: Mon, 9 Dec 2002 10:37:18 +0200 (EET) From: Diomidis Spinellis <dds@aueb.gr> To: FreeBSD-gnats-submit@FreeBSD.org Cc: Isidor Kouvelas <kouvelas@cisco.com>, Orion Hodson <hodson@aciri.org>, Archie Cobbs <archie@dellroad.org>, Stephen Casner <casner@packetdesign.com> Subject: kern/46116: EHOSTDOWN when ARP replies get drowned by data following ARP pre-requests Message-ID: <200212090837.gB98bIH6049775@istlab.dmst.aueb.gr>
next in thread | raw e-mail | index | archive | help
>Number: 46116 >Category: kern >Synopsis: EHOSTDOWN when ARP replies get drowned by data following ARP pre-requests >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Dec 09 00:40:03 PST 2002 >Closed-Date: >Last-Modified: >Originator: Diomidis Spinellis >Release: FreeBSD 4.7-RELEASE-p2 i386 >Organization: Athens University of Economics and Business >Environment: System: FreeBSD spiti 4.7-RELEASE-p2 FreeBSD 4.7-RELEASE-p2 #7: Thu Dec 5 02:59:36 EET 2002 dds@spiti:/vol/obj/vol/src/sys/SPITI i386 >Description: The change introduced by PR-25517 http://www.freebsd.org/cgi/query-pr.cgi?pr=25517 has arp_rtrequest (netinet/if_ether.c) start making ARP queries arpt_maxtries * arpt_down seconds before the ARP entry expiry time in an effort to avoid the ARP entry's expiry. In such a case the ARP reply can get drowned by the actual data the host will send following the ARP request. The original change has the la_asked counter get incremented after each pre-request. As a result, when no reply is received after all pre-requests are made, the host (unfairly) gets marked as down. As an example, a client playing MP3 music from an NFS server will pause its playback for 20 seconds (host_down_time) every 20 minutes (max_age). >How-To-Repeat: To avoid waiting for 20 minutes and make the packets more observable set: sysctl net.link.ether.inet.max_age=60 sysctl net.link.ether.inet.host_down_time=5 On a 10Mb/s ethernet have a 150MHz Pentium client running FreeBSD play 192Mb/s MP3 data from an NFS-mounted server file using mpg123. Monitor the packet exchange from a third host and install a kernel breakpoint in in_arpinput to verify that the client does send ARP replies. The problem will appear as follows: 16:05:02.953682 client.1622700147 > server.nfs: 112 read [|nfs] 16:05:02.955113 arp who-has client (88:2:8:1:3:f6) tell server // Server drowns the client's reply with data it immediately sends 16:05:02.956720 server.nfs > client.1622700147: reply ok 1472 read (frag 54854:1480@0+) 16:05:02.957972 server > client: (frag 54854:1480@1480+) // [...] no ARP reply 16:05:08.141564 arp who-has client (88:2:8:1:3:f6) tell server // [...] no ARP reply 16:05:12.891851 arp who-has client (88:2:8:1:3:f6) tell server // [...] no ARP reply 16:05:18.006188 arp who-has client (88:2:8:1:3:f6) tell server // [...] no ARP reply 16:05:23.121545 arp who-has client (88:2:8:1:3:f6) tell server // [...] no ARP reply // ARP entry expires and arpt_maxtries is reached // server logs /kernel: nfsd send error 64 // client stops getting replies from the server: 16:05:26.964593 client.1622700218 > server.nfs: 96 access [|nfs] 16:05:27.014233 client.1622700218 > server.nfs: 96 access [|nfs] 16:05:27.104235 client.1622700218 > server.nfs: 96 access [|nfs] 16:05:27.213220 client.1622700219 > server.nfs: 112 read [|nfs] 16:05:27.274168 client.1622700218 > server.nfs: 96 access [|nfs] 16:05:27.304173 client.1622700219 > server.nfs: 112 read [|nfs] 16:05:27.494200 client.1622700219 > server.nfs: 112 read [|nfs] 16:05:27.604169 client.1622700218 > server.nfs: 96 access [|nfs] 16:05:27.864192 client.1622700219 > server.nfs: 112 read [|nfs] 16:05:28.254128 client.1622700218 > server.nfs: 96 access [|nfs] 16:05:28.594132 client.1622700219 > server.nfs: 112 read [|nfs] 16:05:29.544090 client.1622700218 > server.nfs: 96 access [|nfs] 16:05:30.044069 client.1622700219 > server.nfs: 112 read [|nfs] 16:05:32.113932 client.1622700218 > server.nfs: 96 access [|nfs] 16:05:32.933903 client.1622700219 > server.nfs: 112 read [|nfs] // After host_down seconds the ARP request-reply succeeds, // the serves sends the - now mbuf-held - packet, and communication resumes: 16:05:32.935318 arp who-has client (88:2:8:1:3:f6) tell server 16:05:32.935651 arp reply client is-at 0:0:e8:32:81:e8 16:05:32.936735 server > client: (frag 54997:928@7400) 16:05:37.243660 client.1622700218 > server.nfs: 96 access [|nfs] 16:05:37.244381 server.nfs > client.1622700218: reply ok 120 access c 60b870bc 16:05:37.245177 client.1622700220 > server.nfs: 96 access [|nfs] 16:05:37.245883 server.nfs > client.1622700220: reply ok 120 access c e523ff20 16:05:38.703607 client.1622700219 > server.nfs: 112 read [|nfs] 16:05:38.707055 server.nfs > client.1622700219: reply ok 1472 read (frag 55020:1480@0+) 16:05:38.708308 server > client: (frag 55020:1480@1480+) 16:05:38.709566 server > client: (frag 55020:1480@2960+) 16:05:38.710804 server > client: (frag 55020:1480@4440+) Repeating the above procedure with a breakpoint in in_arpinput will prove that the client does indeed send a reply, bu will not result in a problem, since the network is clear by the time you type continue in the kernel debugger. >Fix: The problem can be fixed in three different ways: 1. Remove the ARP pre-request code 2. Hold the packet in an mbuf when performing a pre-request 3. Do not mark a host that does not reply to ARP pre-requests as down Solutions 1 and 2 can affect performance, so the third solution was adopted. An additional counter la_preask now counts queries before the ARP entry expiry time. This counter is not used for marking a host down; the original counter la_asked is now only used to count retries for expired ARP entries and mark a host down if needed. The fix also corrects the data type and comment used for la_asked. *** if_ether.c 2002/12/08 14:31:40 1.1 --- if_ether.c 2002/12/08 16:11:26 *************** *** 95,101 **** LIST_ENTRY(llinfo_arp) la_le; struct rtentry *la_rt; struct mbuf *la_hold; /* last packet until resolved/timeout */ ! long la_asked; /* last time we QUERIED for this addr */ #define la_timer la_rt->rt_rmx.rmx_expire /* deletion time in seconds */ }; --- 95,102 ---- LIST_ENTRY(llinfo_arp) la_le; struct rtentry *la_rt; struct mbuf *la_hold; /* last packet until resolved/timeout */ ! int la_preask; /* #times we preQUERIED for this addr */ ! int la_asked; /* #times we QUERIED for this addr */ #define la_timer la_rt->rt_rmx.rmx_expire /* deletion time in seconds */ }; *************** *** 438,450 **** * arpt_down interval. */ if ((rt->rt_expire != 0) && ! (time_second + (arp_maxtries - la->la_asked) * arpt_down > rt->rt_expire)) { arprequest(ifp, &SIN(rt->rt_ifa->ifa_addr)->sin_addr, &SIN(dst)->sin_addr, IF_LLADDR(ifp)); ! la->la_asked++; } bcopy(LLADDR(sdl), desten, sdl->sdl_alen); --- 439,451 ---- * arpt_down interval. */ if ((rt->rt_expire != 0) && ! (time_second + (arp_maxtries - la->la_preask) * arpt_down > rt->rt_expire)) { arprequest(ifp, &SIN(rt->rt_ifa->ifa_addr)->sin_addr, &SIN(dst)->sin_addr, IF_LLADDR(ifp)); ! la->la_preask++; } bcopy(LLADDR(sdl), desten, sdl->sdl_alen); *************** *** 480,486 **** else { rt->rt_flags |= RTF_REJECT; rt->rt_expire += arpt_down; ! la->la_asked = 0; } } --- 481,487 ---- else { rt->rt_flags |= RTF_REJECT; rt->rt_expire += arpt_down; ! la->la_preask = la->la_asked = 0; } } *************** *** 734,740 **** if (rt->rt_expire) rt->rt_expire = time_second + arpt_keep; rt->rt_flags &= ~RTF_REJECT; ! la->la_asked = 0; if (la->la_hold) { (*ifp->if_output)(ifp, la->la_hold, rt_key(rt), rt); --- 735,741 ---- if (rt->rt_expire) rt->rt_expire = time_second + arpt_keep; rt->rt_flags &= ~RTF_REJECT; ! la->la_preask = la->la_asked = 0; if (la->la_hold) { (*ifp->if_output)(ifp, la->la_hold, rt_key(rt), rt); *************** *** 858,864 **** if (rt->rt_refcnt > 0 && (sdl = SDL(rt->rt_gateway)) && sdl->sdl_family == AF_LINK) { sdl->sdl_alen = 0; ! la->la_asked = 0; rt->rt_flags &= ~RTF_REJECT; return; } --- 859,865 ---- if (rt->rt_refcnt > 0 && (sdl = SDL(rt->rt_gateway)) && sdl->sdl_family == AF_LINK) { sdl->sdl_alen = 0; ! la->la_preask = la->la_asked = 0; rt->rt_flags &= ~RTF_REJECT; return; } The following packets demonstrate the fixed behavior: 19:05:20.022221 arp who-has client (88:2:8:1:3:f6) tell server 19:05:20.023796 server.nfs > client.1622712534: reply ok 1472 read (frag 10734:1480@0+) 19:05:20.025042 server > client: (frag 10734:1480@1480+) // [...] no ARP reply 19:05:24.844843 arp who-has client (88:2:8:1:3:f6) tell server // [...] no ARP reply 19:05:29.959532 arp who-has client (88:2:8:1:3:f6) tell server // [...] no ARP reply 19:05:35.072951 arp who-has client (88:2:8:1:3:f6) tell server // [...] no ARP reply 19:05:39.824958 arp who-has client (88:2:8:1:3:f6) tell server // [...] no ARP reply // ARP entry has expired; server now holds the packet and WAITS for a reply 19:05:43.988615 arp who-has client (88:2:8:1:3:f6) tell server 19:05:43.988966 arp reply client is-at 0:0:e8:32:81:e8 // Reply-received, communication resumes without interruption: 19:05:43.990042 server > client: (frag 10827:928@7400) 19:05:44.095164 client.1622712606 > server.nfs: 112 read [|nfs] 19:05:44.097995 server.nfs > client.1622712606: reply ok 1472 read (frag 10828:1480@0+) 19:05:44.099237 server > client: (frag 10828:1480@1480+) >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200212090837.gB98bIH6049775>