Date: Fri, 1 Apr 2011 11:00:36 -0500 From: Brandon Gooch <jamesbrandongooch@gmail.com> To: Steve Polyack <korvus@comcast.net> Cc: freebsd-net@freebsd.org, Frederique Rijsdijk <frederique@isafeelin.org> Subject: Re: Network stack unstable after arp flapping Message-ID: <AANLkTi=TmSRFx5q=PYvU3gpdNug97CwMmPndJc280SH-@mail.gmail.com> In-Reply-To: <4D95E62A.5000109@comcast.net> References: <20110401141655.GA5350@deta.isafeelin.org> <4D95E62A.5000109@comcast.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 1, 2011 at 9:50 AM, Steve Polyack <korvus@comcast.net> wrote: > On 04/01/11 10:16, Frederique Rijsdijk wrote: >> >> Hi, >> >> We (hosting provider) are in the process of implementing ipv6 in our >> network (yay). Yesterday one of the final steps in configuring and updat= ing >> our core routers were taken, which did not go entirely as planned. As a >> result, the default gateway mac addresses for all our machines changed a= bout >> 800 times in a time span of about 4 minutes. >> >> Here's a small piece of the logging: >> >> Mar 31 18:36:12 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d = to >> 00:00:0c:07:ac:3d on bge0 >> Mar 31 18:36:12 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:07:ac:3d = to >> 00:00:0c:9f:f0:3d on bge0 >> Mar 31 18:36:13 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d = to >> 00:00:0c:07:ac:3d on bge0 >> Mar 31 18:36:14 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:07:ac:3d = to >> 00:00:0c:9f:f0:3d on bge0 >> Mar 31 18:36:14 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d = to >> 00:00:0c:07:ac:3d on bge0 >> Mar 31 18:36:14 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:07:ac:3d = to >> 00:00:0c:9f:f0:3d on bge0 >> Mar 31 18:36:15 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d = to >> 00:00:0c:07:ac:3d on bge0 >> >> The x.x.x.1 is always the same IP, the gateway of the machine. >> >> The result of that, is that loads of FreeBSD machines (6.x, 7.x and 8.x) >> developed serious network issues, mainly being no or slow traffic betwee= n >> other (FreeBSD) machine accross different VLAN's in our own network. >> >> First thing that comes to mind is the network itself, but all Linux >> machines (Ubuntu, Red Hat and CentOS) had no issues at all. Only BSD. >> >> An arp -ad on both machines where problems occured, didn't solve anythin= g. >> What worked better was /etc/rc.d/netif restart and a /etc/rc.d/routing >> restart. Some machines even had to be rebooted in order to get networkin= g >> back to normal. >> >> This almost sounds like a bug in the network stack in BSD, but I can not >> imagine that I'm right. The BSD networking stack is considered to be one= of >> the best.. >> >> Any ideas anyone? > > We experienced a similar issue here, but IIRC only on our 8.x systems (we > don't have any 7.x). =A0Disabling flowtable cleared everything up immedia= tely. > =A0You can try that and see if it helps. =A0It seems like the flowtable = =A0caches > and associates the next-hop router MAC address with each flow, and > unfortunately this doesn't get purged when the kernel senses and logs an = ARP > change. =A0The only other solution I've seen was to stop all network traf= fic > on the machine until the flows/cache entries expired. > > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D155604 has more details of m= y > run-in with this. =A0The title should be corrected though, as I found sho= rtly > after that all traffic is affected. > > - Steve FYI, the FLOWTABLE option has been removed from the DEFAULT kernel config on HEAD, a change which will be MFC'd in a couple of days to 8-STABLE... -Brandon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTi=TmSRFx5q=PYvU3gpdNug97CwMmPndJc280SH->