Date: Sat, 23 Oct 2010 04:21:31 -0400 From: Mike Tancsa <mike@sentex.net> To: Jack Vogel <jfvogel@gmail.com> Cc: Chris Morrow <morrowc@ops-netman.net>, Joel Jaeggli <joelja@bogus.com>, stable <stable@freebsd.org>, warren@kumari.net, Randy Bush <randy@psg.com> Subject: Re: repeating crashes with 8.1 Message-ID: <201010230821.o9N8LVuR001382@lava.sentex.ca> In-Reply-To: <AANLkTimWTTHWC04my3CSoNGYsLarS9F10eoO=8Fz37cF@mail.gmail.c om> References: <m2zku7cqt5.wl%randy@psg.com> <m2y69rcqjc.wl%randy@psg.com> <201010221416.o9MEGSa0094817@lava.sentex.ca> <m2tykeb9ac.wl%randy@psg.com> <201010221425.o9MEPcWC094867@lava.sentex.ca> <m2k4lab6nh.wl%randy@psg.com> <201010221848.o9MIm7WF096197@lava.sentex.ca> <m2y69q9e38.wl%randy@psg.com> <4CC1F3B8.3010302@bogus.com> <4CC225D3.1030502@ops-netman.net> <7.1.0.9.0.20101022210145.06fe25e8@sentex.net> <201010230159.o9N1xGGF098363@lava.sentex.ca> <AANLkTimWTTHWC04my3CSoNGYsLarS9F10eoO=8Fz37cF@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
At 12:41 AM 10/23/2010, Jack Vogel wrote: >Odd, can you make any connection between this and the em complaints?? I dont think so. This is on an igb nic and a different panic/behaviour. I have the box sitting at the debugger prompt in the FreeBSD netperf cluster, so hopefully someone can take a look and see what is the issue. ---Mike >Jack > > >On Fri, Oct 22, 2010 at 6:59 PM, Mike Tancsa ><<mailto:mike@sentex.net>mike@sentex.net> wrote: >At 09:11 PM 10/22/2010, Mike Tancsa wrote: >At 08:01 PM 10/22/2010, Chris Morrow wrote: >Note, Warren and I attempted to test this this evening on a 10.04 Ubuntu >box, no crashy-crashy... > > > >I was able to trigger the issue on box (c). I was ping6ing box (a) >when I did a hard down of (d)'s connected interface. The box then >dropped to debugger > > >Fatal trap 9: general protection fault while in kernel mode >cpuid = 0; apic id = 00 >instruction pointer = 0x20:0xffffffff80740a50 >stack pointer = 0x28:0xffffff800005a890 >frame pointer = 0x28:0xffffff800005a930 > >code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 >processor eflags = interrupt enabled, resume, IOPL = 0 >current process = 12 (swi4: clock) >[thread pid 12 tid 100007 ] >Stopped at in6_cksum+0x410: movzwl (%rsi),%r10d >db> bt >Tracing pid 12 tid 100007 td 0xffffff00025083e0 >in6_cksum() at in6_cksum+0x410 >icmp6_reflect() at icmp6_reflect+0x312 >icmp6_error() at icmp6_error+0x1ec >nd6_llinfo_timer() at nd6_llinfo_timer+0x208 >softclock() at softclock+0x2a6 >intr_event_execute_handlers() at intr_event_execute_handlers+0x66 >ithread_loop() at ithread_loop+0xb2 >fork_exit() at fork_exit+0x12a >fork_trampoline() at fork_trampoline+0xe >--- trap 0, rip = 0, rsp = 0xffffff800005ad30, rbp = 0 --- >db> > > > > >I was able to do it, but not the box I expected > >4 boxes > >(a) Attacking host 2001:db8:1:1/64 >(b) victim, not on a connected interface with a). Outside interface >- em0 - 2001:db8::2:1/64, inside interface - em1 - 2001:db8::3:1/64 >(c) a host behind (b) 2001:db8::3:c/64 >(d) a host behind (b), 2001:db8::3:d/64 > > >hosts (c) and (d) have default gateways to b). (c) however, has a >next hop for (a) via (d). So rather than go out its normal default >gateway, it takes an extra hop via (d). > >Start a ping6 from (a) to (c). Then down (d)'s interface so that >the ping6 fails. Let the ping keep running for an hour or >two. Eventually (b) gets error messages like > >Oct 22 18:38:32 zoo kernel: em1: discard frame w/o packet header > >and crashes. > >Unfortunately, I thought it would be (c) that crapped out, not (b) >and I didnt have crash dumps enabled on the host. Just in the >process of setting up a better environment. > > ---Mike > >-chris > >On 10/22/10 16:27, Joel Jaeggli wrote: > > Ok I'll try testing that on some box I can reach with both hands. > > > > fyi nagasaki is: > > > > [root@nagasaki ~]# uname -a > > FreeBSD <http://nagasaki.bogus.com>nagasaki.bogus.com > 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #13: > > Sun May 30 22:19:23 UTC 2010 > > root@nagasaki.bogus.com:/usr/obj/usr/src/sys/GENERIC i386 > > [root@nagasaki ~]# > > > > > > On 10/22/10 1:17 PM, Randy Bush wrote: > >>>>>>> Do you know how this panic is triggered ? Are you able to > >>>>>>> create it on demand ? > >>>>>> > >>>>>> no i do not. bring server up and it'll happen in half an hour. > >>>>>> and the server was happy for two months. so i am thinking hardware. > >>>>> > >>>>> Perhaps. The reason I ask is that I had a box go down last night with > >>>>> the same set of errors. The box has a number of ipv6 routes, but its > >>>>> next hop was down and the problems started soon after. So I wonder if > >>>>> it has something to do with that. Do you have ipv6 on this box and > >>>>> are all the next hop addresses correct / reachable ? > >>>>> > >>>>> Oct 22 02:06:02 i4 kernel: em1: discard frame w/o packet header > >>>>> Oct 22 02:06:10 i4 kernel: em2: discard frame w/o packet header > >>>>> Oct 22 02:06:21 i4 kernel: em1: discard frame w/o packet header > >>>> > >>>> it was co-incident with a border router being taken down for new router > >>>> install. that router was the v6 exit the servers was using. i have now > >>>> pointed default6 to a different exit. the server seems happy. > >>> > >>> > >>> Are you servers still up ? I guess the question now is how to > >>> trigger this problem on demand. Perhaps lots of inbound ipv6 traffic > >>> with a bad next hop out ? How recent are you sources ? The kernel > >>> said Oct 21st. Were the sources from then too ? > >> > >> yes, kernel and world from 21 oct > >> > >> chris had an idea on retrigger, install a static for a small dest that > >> points to a hole. send a packet to the small dest. > >> > >> randy > >> > > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex >Communications, ><mailto:mike@sentex.net>mike@sentex.net >Providing Internet since >1994 <http://www.sentex.net>www.sentex.net >Cambridge, Ontario >Canada <http://www.sentex.net/mike>www.sentex.net/mike > > >-------------------------------------------------------------------- >Mike Tancsa, tel +1 519 651 3400 >Sentex >Communications, ><mailto:mike@sentex.net>mike@sentex.net >Providing Internet since >1994 <http://www.sentex.net>www.sentex.net >Cambridge, Ontario >Canada <http://www.sentex.net/mike>www.sentex.net/mike > >_______________________________________________ ><mailto:freebsd-stable@freebsd.org>freebsd-stable@freebsd.org mailing list ><http://lists.freebsd.org/mailman/listinfo/freebsd-stable>http://lists.freebsd.org/mailman/listinfo/freebsd-stable >To unsubscribe, send any mail to >"<mailto:freebsd-stable-unsubscribe@freebsd.org>freebsd-stable-unsubscribe@freebsd.org" > -------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201010230821.o9N8LVuR001382>