From owner-freebsd-bugs Mon Jul 23 14: 0:22 2001 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id D9CDD37B408 for ; Mon, 23 Jul 2001 14:00:00 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.4/8.11.4) id f6NL00O44175; Mon, 23 Jul 2001 14:00:00 -0700 (PDT) (envelope-from gnats) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 15A4837B412 for ; Mon, 23 Jul 2001 13:50:13 -0700 (PDT) (envelope-from nobody@FreeBSD.org) Received: (from nobody@localhost) by freefall.freebsd.org (8.11.4/8.11.4) id f6NKoDT42365; Mon, 23 Jul 2001 13:50:13 -0700 (PDT) (envelope-from nobody) Message-Id: <200107232050.f6NKoDT42365@freefall.freebsd.org> Date: Mon, 23 Jul 2001 13:50:13 -0700 (PDT) From: Voradesh Yenbut To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-1.0 Subject: kern/29170: ARP request fails after "bad gateway value" in if_ether.c Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Number: 29170 >Category: kern >Synopsis: ARP request fails after "bad gateway value" in if_ether.c >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Jul 23 14:00:00 PDT 2001 >Closed-Date: >Last-Modified: >Originator: Voradesh Yenbut >Release: 4.2, 3.4 >Organization: CSE, U of Washington >Environment: FreeBSD bs8.cs.washington.edu 4.2-RELEASE FreeBSD 4.2-RELEASE #2: Mon Jul 23 12:13:29 PDT 2001 root@orion.cs.washington.edu:/usr/src/sys/compile/BS-GENERIC i386 >Description: We have several FreeBSD systems running DNS servers. For some unknown reasons, one of the systems serving a subnet where most clients run Windows 2000, occasionally failed to do arp address resolution. The kernel logged messages like the followings: arp_rtrequest: bad gateway value arplookup 128.95.8.74 failed: could not allocate llinfo arpresolve: can't allocate llinfo for 128.95.8.74rt arp_rtrequest: bad gateway value arplookup 128.95.8.233 failed: could not allocate llinfo arpresolve: can't allocate llinfo for 128.95.8.233rt arp_rtrequest: bad gateway value arplookup 128.95.8.232 failed: could not allocate llinfo arpresolve: can't allocate llinfo for 128.95.8.232rt arplookup 128.95.8.233 failed: could not allocate llinfo arpresolve: can't allocate llinfo for 128.95.8.233rt arp_rtrequest: bad gateway value arplookup 128.95.8.230 failed: could not allocate llinfo arpresolve: can't allocate llinfo for 128.95.8.230rt arp_rtrequest: bad gateway value arplookup 128.95.8.160 failed: could not allocate llinfo arpresolve: can't allocate llinfo for 128.95.8.160rt ARP requests to the addresses above failed afterward. A system reboot made ARP requests work again, but sooner or later the same problem comes back. As I searched FreeBSD mailing lists for a solution, several reports of similar problems were found but I did not see a good solution. >How-To-Repeat: I don't know how to repeat this, but it can be simulated by making a condition in arp_rtrequest() of /usr/src/sys/netinet/if_ether.c to break out of RTM_RESOLVE. For example, The following code use a static variable: static int toggle = 1; /* added */ to simulate one fault with bad gateway value condition. case RTM_RESOLVE: if (gate->sa_family != AF_LINK || toggle || /* added */ gate->sa_len < sizeof(null_sdl)) { log(LOG_DEBUG, "arp_rtrequest: bad gateway value\n"); if (toggle) toggle = 0; /* added */ break; } After a system reboot, the system will generate "rp_rtrequest: bad gateway value" to the first host it tries to contact which is is likely to be its default gateway. Even though toggle's value is 0, subsequent attempts to contact the host generates messages: arplookup xx.xx.x.xxx failed: could not allocate llinfo arpresolve: can't allocate llinfo for xx.xx.xx.xxrt This leads to believe that a good cleanup is not automatically done to a route if for some reasons it has an error. >Fix: I don't completely understand the arp code so may not have an insight to really correct the problem, but the following patch seems to get around the problem ("bad gateway value" is still seen but no more messages about llinfo and arp works with the address causing the message.): --- if_ether.c 2001/07/23 16:35:07 1.1 +++ if_ether.c 2001/07/23 19:13:24 @@ -199,7 +199,13 @@ case RTM_RESOLVE: if (gate->sa_family != AF_LINK || gate->sa_len < sizeof(null_sdl)) { - log(LOG_DEBUG, "arp_rtrequest: bad gateway value\n"); + log(LOG_DEBUG, "arp_rtrequest: %s bad gateway value %s\n", + inet_ntoa(SIN(rt_key(rt))->sin_addr), + gate->sa_family != AF_LINK? "family": ""); + rtrequest(RTM_DELETE, + (struct sockaddr *)rt_key(rt), + rt->rt_gateway, + rt_mask(rt), rt->rt_flags, 0); break; } SDL(gate)->sdl_type = rt->rt_ifp->if_type; >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message