Date: Fri, 18 Mar 2011 18:43:40 +0100 From: Mats Lindberg <mats.w.lindberg@gmail.com> To: Mark Tinguely <marktinguely@gmail.com> Cc: freebsd-hackers@freebsd.org Subject: Re: FreeBSD 6 vs 8.1 Message-ID: <AANLkTimje8yrzTYAdVKnkJLM0wo%2Bk66%2BkWv09wSdWknE@mail.gmail.com> In-Reply-To: <4D837C27.4040802@gmail.com> References: <AANLkTi=23g1%2BKv%2B4Pmda3-75-r13GaRFu1_Mtofej3RJ@mail.gmail.com> <4D7DFC6F.80008@gmail.com> <AANLkTi=Gx=YZ%2BZr0q%2BFZ8mcbQyGhjZPSYm6de4ZVwSwx@mail.gmail.com> <4D7E0831.4060804@gmail.com> <AANLkTins89qcvAjd4_x=iZVjR3rMnGaEJUwuMpMAFKny@mail.gmail.com> <4D834F35.5030806@gmail.com> <AANLkTi=QzX9YF=G-5e4c4UWAZMaXF-Gkhq0ZbrA6e3mM@mail.gmail.com> <4D837C27.4040802@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
2011/3/18 Mark Tinguely <marktinguely@gmail.com> > On 3/18/2011 10:11 AM, Mats Lindberg wrote: > > > > 2011/3/18 Mark Tinguely <marktinguely@gmail.com> > >> On 3/18/2011 3:35 AM, Mats Lindberg wrote: >> >>> So - after a while I've made some observations. >>> My problem is actually connected to arp. >>> >>> My config is very static so basically I want to turn off arp requests. >>> Somewhere in the startup scripts I did >>> > sysctl -w net.link.ether.inet.max_age=2147483647 (max accepted value) >>> Which on freebsd-6.x worked fine. >>> In freebsd-8.1 this makes the kernel arp functionality go bezerk - >>> probably an integer overflow somewhere. >>> arp requests were sent countinously from my freebsd-8.1 node to others, >>> flooding the network. >>> I tried to lower this value and found that 500000000s works fine >>> 1000000000s does not. 500000000s is OK to me so I won't try to narrow it >>> down more. >>> >>> The reason I was suspecting swapping problems was that after a while with >>> the flooding going on I got a kernel panic saying 'page fault', which I >>> would guess is a another bug, but, with a sensible setting on the arp >>> timeout the kernel panic does not show itself any longer. >>> >>> I've googled for my arp-setting problem but not found anything on it. So >>> - maybe I'm the first to see this. >>> Should I enter a bug report somewhere? >>> I guess this forum is not the place. >>> >>> /Mats >>> >>> >> Did your HZ (timer interrupts per second) increase from 100 on FreeBSD-6 >> to 1000 on FreeBSD-8.1? This must be a 32 bit computer / OS because that >> variable is multiplied to hz: >> >> canceled = callout_reset(&la->la_timer, >> hz * V_arpt_keep, arptimer, la); >> >> and: >> >> #define callout_reset(c, on_tick, fn, arg) \ >> callout_reset_on((c), (on_tick), (fn), (arg), (c)->c_cpu) >> >> where: >> >> int callout_reset_on(struct callout *, int , void (*ftn)(void *), void *, >> int) >> >> I would guess that you are wrapping with 32 bit arithmetic to a small >> value. Both the hz==100 and hz==1000 will wrap to about the same number (a >> negative number). I did not look at the FreeBSD 6.x callout, but I think in >> the FreeBSD 8 callout, negative on_tick will be immediately called on the >> next tick.. >> > > Yes I could imagine this is it. > > >> >> A page fault panic is a kernel access to a non-mapped VA (a bad pointer). >> The panic message would have the VA and instruction address information. >> >> --Mark >> > > Well, > Both systems are i386 32bit > > On FreeBSD-6 I have: (GENERIC) kernel > kern.clockrate: { hz = 1000, tick = 1000, profhz = 666, stathz = 133 } > On FreeBSD-8 I have:(Excluded some drivers from GENERIC kernel) > kern.clockrate: { hz = 1000, tick = 1000, profhz = 2000, stathz = 133 } > kern.hz: 1000 > > So same HZ -- seems the callout is implemented differently 6.x->8.1 > > For the kernel panic I get > fault virtual address: 0x8 > instruction pointer: 0x20:0xc0679ed7 > current process: 0, (em0 taskq) > > I don't know anything about these numbers, or if you even did want to know. > To me I get the feeling that this is connected to my arp problem, seems to > be something in the em driver that is not handled at this high load. > > I'm quite happy now - my system has been up and running for the whole day - > so I'll leave it at this - thanks > > /Mats > > Good news. > > After the reply, I did look at the FreeBSD 6.4 ARP code > (sys/netinet/if_ether.c) and the code changed between FreeBSD 6 and 8. I > would suggest that if you set the max arp number, that it be less than > (2^32-1)/hz. This value is added to the route time out value also, so be > careful on the value. > > The fault va/instruction pointer is a classic NULL pointer dereference. > > --Mark. > Good, many thanks... Just out of interest - is this a bug? 1) The sysctl accepting values it can't handle 2) The kernel/em driver panic? In my world it would be... /Mats
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTimje8yrzTYAdVKnkJLM0wo%2Bk66%2BkWv09wSdWknE>