From owner-freebsd-questions Tue Mar 21 11:47:32 1995 Return-Path: questions-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id LAA01026 for questions-outgoing; Tue, 21 Mar 1995 11:47:32 -0800 Received: from portal (portal.netedge.com [199.170.8.2]) by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id LAA01020 for ; Tue, 21 Mar 1995 11:47:29 -0800 Received: from NetEdge.COM by portal id AA01028; Tue, 21 Mar 95 14:44:17 EST Received: from suicidesix.NetEdge.COM by NetEdge.COM id AA03007; Tue, 21 Mar 95 14:46:58 EST Received: by suicidesix.NetEdge.COM (4.1/NECL-6.14) id AA04017; Tue, 21 Mar 95 14:38:14 EST Message-Id: <9503211938.AA04017@NetEdge.COM> To: Garrett Wollman Cc: captain@pubnix.net, freebsd-questions@FreeBSD.org, gated-alpha@gated.cornell.edu Subject: Re: Gated is crashing my system! In-Reply-To: Your message of "Tue, 21 Mar 1995 10:11:50 EST." <9503211511.AA29099@halloran-eldar.lcs.mit.edu> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Id: <4015.795814693.1@suicidesix> Date: Tue, 21 Mar 1995 14:38:14 -0500 From: Thomas Pusateri Sender: questions-owner@FreeBSD.org Precedence: bulk In message <9503211511.AA29099@halloran-eldar.lcs.mit.edu> you write: ><om> said: > >> I'm having some problems with gated R3_5Alpha9 on a FreeBSD 2.0 system. >> The system has 5 dialin lines that either allow shell or slip access. >> After operating correctly for some time (1-2 hours), the system crashes >> after gated tries to remove a route from the kernel when one of the slip >> lines shuts down. The commands issued to shutdown the slip line are: >> ifconfig slx down >> ifconfig slx delete > >Try pulling over the -current version of netinet/ip_output.c. I >recently fixed some potential race conditions involving multicast >options which I think may be the cause of this problem. (And if this >doesn't fix it, I'd like to know that, too.) I have seen this problem many times over the past several years with Reno and later systems. The problem is this: When a group is added to the interface, the group information hangs off the in_ifaddr and well as the socket in_pcb since it is added with a socket option. If the interface address changes while an application is running that has joined a group, the group membership information gets blown away when the in_ifaddr is deleted but the socket in_pcb information remains because the socket is still open. Later, when you leave the group, the socket information gets deleted but there is nothing now on the in_ifaddr and you panic. This never happened in 4.3 because when an address changes, the in_ifaddr just gets updated. But in 4.3 Reno and later, the in_ifaddr gets deleted by the SIOCDIFADDR and a new one is added with the SIOCAIFADDR. What it boils down to is that the 4.3 Reno and later change address operation is no longer atomic. The "ifconfig" command issues two seperate ioctl's (one to delete and one to add) when changing an address. The hack that Jeff Honig mentioned is one I had written that just saved the group information when a in_ifaddr was deleted and added them back when the new in_ifaddr was created. Its a hack because it always assumed that an Add would follow a Delete. Its tricky to decide if you are just deleting an in_ifaddr. I don't have it handy, but maybe Jeff can send it to anyone interested or I'll try and dig it up. The real issue is how to make an address change atomic. You could add a change ioctl operation (SIOCCIFADDR) or maybe some of you have a better idea. Thanks, Tom Pusateri