Date: Mon, 22 Nov 2004 17:11:54 -0500 From: John Baldwin <jhb@FreeBSD.org> To: Sten Spans <sten@blinkenlights.nl> Cc: freebsd-alpha@FreeBSD.org Subject: Re: alpha and em mtu Message-ID: <200411221711.54916.jhb@FreeBSD.org> In-Reply-To: <Pine.SOC.4.61.0411222147180.10997@tea.blinkenlights.nl> References: <Pine.SOC.4.61.0411142153430.26307@tea.blinkenlights.nl> <200411221432.42028.jhb@FreeBSD.org> <Pine.SOC.4.61.0411222147180.10997@tea.blinkenlights.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 22 November 2004 04:15 pm, Sten Spans wrote:
> On Mon, 22 Nov 2004, John Baldwin wrote:
> > On Sunday 21 November 2004 07:35 am, Sten Spans wrote:
> >>> Does this panic go
> >>> away if you use a different MTU btw?
> >>
> >> I've tried running
> >>
> >> i=1; while true; echo $i; ifconfig em0 mtu $i; let i++; sleep 2;
> >>
> >> and on the client:
> >> while true; do echo bla | telnet alpha 22; sleep 1; done
> >>
> >> this caused no crashes with mtu 1-1500.
> >>
> >> But:
> >> deepthought# ifconfig em0 mtu 1666
> >> deepthought# tcp_input: ip 0xfffffc0018cdb00e is misaligned
> >> deepthought# ifconfig em0 mtu 1564
> >> deepthought# tcp_input: ip 0xfffffc001857c80e is misaligned
> >> deepthought# ifconfig em0 mtu 1532
> >> deepthought# tcp_input: ip 0xfffffc001859300e is misaligned
> >>
> >> If it has to be 8 bytes aligned then it's off by 4, doesn't
> >> seem to be vlanmtu though.
>
> erm, that would be 2.
>
> > Ok, this is helpful I think. (Big MTU -> panic.)
>
> Another thing is :
>
> deepthought# ifconfig em0 mtu 9000
> sten@ford:~$ ping -s 8000 intern.dt
> PING intern.deepthought.blinkenlights.nl (192.168.1.3) 8000(8028) bytes of
> data. 8008 bytes from intern.deepthought.blinkenlights.nl (192.168.1.3):
> icmp_seq=1 ttl=64 time=1.19 ms 8008 bytes from
> intern.deepthought.blinkenlights.nl (192.168.1.3): icmp_seq=2 ttl=64
> time=0.756 ms
>
> 21:59:12.587494 IP intern.ford > intern.deepthought.blinkenlights.nl: icmp
> 8008: echo request seq 1 21:59:12.588223 IP
> intern.deepthought.blinkenlights.nl > intern.ford: icmp 8008: echo reply
> seq 1 21:59:13.587730 IP intern.ford > intern.deepthought.blinkenlights.nl:
> icmp 8008: echo request seq 2
>
> Aka icmp does work, which makes me think that the
> problem is tcp specific. I've also tried disabling all
> the sack/tcp sysctl's but that didn't seem to help.
> And I've tried connecting from a box with mtu 1500,
> but that also caused the same panic.
>
>
> I'll get an sk card soonish which will allow me to double
> check this panic with another nic. Although I would not guess
> that the panic is driver specific. Which makes me wonder why
> lo0 does work:
> deepthought# ifconfig lo0 mtu 1501
> deepthought# telnet 127.0.0.1 22
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> SSH-2.0-OpenSSH_3.8.1p1 FreeBSD-20040419
>
> > The next step is probably
> > to start walking up the stack determining where the pointer starts off
> > and how it ends up aligned. Can you use gdb to figure out the source
> > file/line of the previous stack frame before tcp_input()?
>
> sure:
>
> db> trace
> tcp_input() at tcp_input+0x3a4
> ip_input() at ip_input+0x9fc
> netisr_processqueue() at netisr_processqueue+0xac
> swi_net() at swi_net+0xf0
> ithread_loop() at ithread_loop+0x1d4
> fork_exit() at fork_exit+0x100
> exception_return() at exception_return
> --- root of call graph ---
>
> (gdb) l *tcp_input+0x3a4
> 0xfffffc00004cd054 is in tcp_input (/usr/src/sys/netinet/tcp_input.c:554).
> 549
> 550 /*
> 551 * Check that TCP offset makes sense,
> 552 * pull out TCP options and adjust length.
> XXX
> 553 */
> 554 off = th->th_off << 2;
> 555 if (off < sizeof (struct tcphdr) || off > tlen) {
> 556 tcpstat.tcps_rcvbadoff++;
> 557 goto drop;
> 558 }
> (gdb) l *ip_input+0x9fc
> 0xfffffc00004c355c is in ip_input (/usr/src/sys/netinet/ip_input.c:739).
> 734 /*
> 735 * Switch out to protocol's input routine.
> 736 */
> 737 ipstat.ips_delivered++;
> 738
> 739 (*inetsw[ip_protox[ip->ip_p]].pr_input)(m, hlen);
> 740 return;
> 741 bad:
> 742 m_freem(m);
> 743 }
> (gdb) l *netisr_processqueue+0xac
> 0xfffffc00004ad45c is in netisr_processqueue
> (/usr/src/sys/net/netisr.c:233).
> 228
> 229 for (;;) {
> 230 IF_DEQUEUE(ni->ni_queue, m);
> 231 if (m == NULL)
> 232 break;
> 233 ni->ni_handler(m);
> 234 }
> 235 }
Hmm, so can you check here to see if the 'm' pointer in this routine is
misaligned? If so, then this may be a driver bug.
--
John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve" = http://www.FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200411221711.54916.jhb>
