Date: Fri, 22 Oct 2010 21:59:17 -0400 From: Mike Tancsa <mike@sentex.net> To: Chris Morrow <morrowc@ops-netman.net>, Joel Jaeggli <joelja@bogus.com> Cc: Randy Bush <randy@psg.com>, stable <stable@freebsd.org>, warren@kumari.net Subject: Re: repeating crashes with 8.1 Message-ID: <201010230159.o9N1xGGF098363@lava.sentex.ca> In-Reply-To: <7.1.0.9.0.20101022210145.06fe25e8@sentex.net> References: <m2zku7cqt5.wl%randy@psg.com> <m2y69rcqjc.wl%randy@psg.com> <201010221416.o9MEGSa0094817@lava.sentex.ca> <m2tykeb9ac.wl%randy@psg.com> <201010221425.o9MEPcWC094867@lava.sentex.ca> <m2k4lab6nh.wl%randy@psg.com> <201010221848.o9MIm7WF096197@lava.sentex.ca> <m2y69q9e38.wl%randy@psg.com> <4CC1F3B8.3010302@bogus.com> <4CC225D3.1030502@ops-netman.net> <7.1.0.9.0.20101022210145.06fe25e8@sentex.net>
next in thread | previous in thread | raw e-mail | index | archive | help
At 09:11 PM 10/22/2010, Mike Tancsa wrote: >At 08:01 PM 10/22/2010, Chris Morrow wrote: >>Note, Warren and I attempted to test this this evening on a 10.04 Ubuntu >>box, no crashy-crashy... > I was able to trigger the issue on box (c). I was ping6ing box (a) when I did a hard down of (d)'s connected interface. The box then dropped to debugger Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff80740a50 stack pointer = 0x28:0xffffff800005a890 frame pointer = 0x28:0xffffff800005a930 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock) [thread pid 12 tid 100007 ] Stopped at in6_cksum+0x410: movzwl (%rsi),%r10d db> bt Tracing pid 12 tid 100007 td 0xffffff00025083e0 in6_cksum() at in6_cksum+0x410 icmp6_reflect() at icmp6_reflect+0x312 icmp6_error() at icmp6_error+0x1ec nd6_llinfo_timer() at nd6_llinfo_timer+0x208 softclock() at softclock+0x2a6 intr_event_execute_handlers() at intr_event_execute_handlers+0x66 ithread_loop() at ithread_loop+0xb2 fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff800005ad30, rbp = 0 --- db> >I was able to do it, but not the box I expected > >4 boxes > >(a) Attacking host 2001:db8:1:1/64 >(b) victim, not on a connected interface with a). Outside interface >- em0 - 2001:db8::2:1/64, inside interface - em1 - 2001:db8::3:1/64 >(c) a host behind (b) 2001:db8::3:c/64 >(d) a host behind (b), 2001:db8::3:d/64 > > >hosts (c) and (d) have default gateways to b). (c) however, has a >next hop for (a) via (d). So rather than go out its normal default >gateway, it takes an extra hop via (d). > >Start a ping6 from (a) to (c). Then down (d)'s interface so that >the ping6 fails. Let the ping keep running for an hour or >two. Eventually (b) gets error messages like > >Oct 22 18:38:32 zoo kernel: em1: discard frame w/o packet header > >and crashes. > >Unfortunately, I thought it would be (c) that crapped out, not (b) >and I didnt have crash dumps enabled on the host. Just in the >process of setting up a better environment. > > ---Mike > >>-chris >> >>On 10/22/10 16:27, Joel Jaeggli wrote: >> > Ok I'll try testing that on some box I can reach with both hands. >> > >> > fyi nagasaki is: >> > >> > [root@nagasaki ~]# uname -a >> > FreeBSD nagasaki.bogus.com 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #13: >> > Sun May 30 22:19:23 UTC 2010 >> > root@nagasaki.bogus.com:/usr/obj/usr/src/sys/GENERIC i386 >> > [root@nagasaki ~]# >> > >> > >> > On 10/22/10 1:17 PM, Randy Bush wrote: >> >>>>>>> Do you know how this panic is triggered ? Are you able to >> >>>>>>> create it on demand ? >> >>>>>> >> >>>>>> no i do not. bring server up and it'll happen in half an hour. >> >>>>>> and the server was happy for two months. so i am thinking hardware. >> >>>>> >> >>>>> Perhaps. The reason I ask is that I had a box go down last night with >> >>>>> the same set of errors. The box has a number of ipv6 routes, but its >> >>>>> next hop was down and the problems started soon after. So I wonder if >> >>>>> it has something to do with that. Do you have ipv6 on this box and >> >>>>> are all the next hop addresses correct / reachable ? >> >>>>> >> >>>>> Oct 22 02:06:02 i4 kernel: em1: discard frame w/o packet header >> >>>>> Oct 22 02:06:10 i4 kernel: em2: discard frame w/o packet header >> >>>>> Oct 22 02:06:21 i4 kernel: em1: discard frame w/o packet header >> >>>> >> >>>> it was co-incident with a border router being taken down for new router >> >>>> install. that router was the v6 exit the servers was >> using. i have now >> >>>> pointed default6 to a different exit. the server seems happy. >> >>> >> >>> >> >>> Are you servers still up ? I guess the question now is how to >> >>> trigger this problem on demand. Perhaps lots of inbound ipv6 traffic >> >>> with a bad next hop out ? How recent are you sources ? The kernel >> >>> said Oct 21st. Were the sources from then too ? >> >> >> >> yes, kernel and world from 21 oct >> >> >> >> chris had an idea on retrigger, install a static for a small dest that >> >> points to a hole. send a packet to the small dest. >> >> >> >> randy >> >> > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex Communications, mike@sentex.net >Providing Internet since 1994 www.sentex.net >Cambridge, Ontario Canada www.sentex.net/mike -------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201010230159.o9N1xGGF098363>