Date: Tue, 26 Feb 2002 19:29:44 -0500 From: "Michael Meltzer" <mjm@michaelmeltzer.com> To: "Steve Watt" <steve@Watt.COM>, <stable@freebsd.org> Subject: Re: NFS/amd weirdness Message-ID: <00f601c1bf25$da7b1880$34f820c0@ix1x1000> References: <200202270016.g1R0G3197599@wattres.Watt.COM>
next in thread | previous in thread | raw e-mail | index | archive | help
I got caught by "bad checksum" before, in my case the ethernet card was doing a hardware checksum, but tcpdump got the message before the card, so all packet that where transmitted looked like "bad checksum" to tcpdump(i.e. the check sum field was filled with random values to be filled by hardware on the way out). you might want to look at the packets from the other side. I am not sure while this should clear in full dump mode. MJM ----- Original Message ----- From: "Steve Watt" <steve@Watt.COM> To: <stable@freebsd.org> Sent: Tuesday, February 26, 2002 7:16 PM Subject: NFS/amd weirdness > I'm having some trouble with NFS between a FreeBSD client (4.5-STABLE as > of roughly 0Z 15 Feb) and a Linux (Dell hacked-up 2.2.14-6.1.1smp). > The FreeBSD client is oberon, the server is genova. > > Basically, what happens is that quite regularly (but, of course, not > always), amd-mounted nfs points seem to hang for a long time. I don't > see messages about the NFS server not responding in the syslogs, but > I did a tcpdump during one of the failures and got the following > interesting output: > > % tcpdump -s1500 -vv host genova > tcpdump: listening on xl0 > 15:59:50.124361 oberon.314179584 > genova.nfs: 40 null (ttl 64, id 51398, len 68, bad cksum 0!) > 15:59:50.124897 genova.nfs > oberon.314179584: reply ok 24 null (ttl 63, id 57434, len 52) > 15:59:50.125676 oberon.1022 > genova.sunrpc: [bad udp cksum ea59!] udp 116 (ttl 64, id 51399, len 144, bad cksum 0!) > 15:59:50.126360 genova.sunrpc > oberon.1022: [udp sum ok] udp 28 (ttl 63, id 57435, len 56) > 15:59:53.144489 oberon.1019 > genova.blackjack: S [bad tcp cksum b880!] 4093252687:4093252687(0) win 65535 <mss 1460> (DF) (ttl 64, id 51407, len 44, bad cksum 0!) > 16:00:17.145720 oberon.1019 > genova.blackjack: S [bad tcp cksum b880!] 4093252687:4093252687(0) win 65535 <mss 1460> (DF) (ttl 64, id 51419, len 44, bad cksum 0!) > 16:00:20.135911 oberon.851050496 > genova.nfs: 40 null (ttl 64, id 51421, len 68, bad cksum 0!) > 16:00:20.137385 genova.nfs > oberon.851050496: reply ok 24 null (ttl 63, id 57561, len 52) > > Those bad cksum, bad udp cksum, and bad tcp cksum make me somewhat nervous. > > I then ran a tcpdump -X to catch it, and the problem shook itself out > while (or slightly before) that was going on: > > % tcpdump -s1500 -X -vv host genova and port nfs > tcpdump: listening on xl0 > 16:00:35.229138 oberon.teja.com.603827759 > genova.teja.com.nfs: 112 getattr fh Unknown/1 (ttl 64, id 51479, len 140, bad cksum 0!) > 0x0000 4500 008c c917 0000 4011 0000 c0a8 0116 E.......@....... > 0x0010 c0a8 011e 02da 0801 0078 840e 23fd ae2f .........x..#../ > 0x0020 0000 0000 0000 0002 0001 86a3 0000 0002 ................ > 0x0030 0000 0001 0000 0001 0000 0028 0000 0000 ...........(.... > 0x0040 0000 0000 0000 0000 0000 01f4 0000 0005 ................ > 0x0050 0000 01f4 0000 0000 0000 000f 0000 01f5 ................ > 0x0060 0000 01f6 0000 0000 0000 0000 caba ebfe ................ > 0x0070 dce7 3000 0200 0000 0a08 0000 0a08 0000 ..0............. > 0x0080 dce7 3000 f907 5916 0000 0000 ..0...Y..... > 16:00:35.230098 genova.teja.com.nfs > oberon.teja.com.603827759: reply ok 96 getattr DIR 40775 ids 539/501 sz 4096 (ttl 63, id 58976, len 124) > 0x0000 4500 007c e660 0000 3f11 118c c0a8 011e E..|.`..?....... > 0x0010 c0a8 0116 0801 02da 0068 b917 23fd ae2f .........h..#../ > 0x0020 0000 0001 0000 0000 0000 0000 0000 0000 ................ > 0x0030 0000 0000 0000 0000 0000 0002 0000 41fd ..............A. > 0x0040 0000 0002 0000 021b 0000 01f5 0000 1000 ................ > 0x0050 0000 1000 0000 0000 0000 0008 0000 080a ................ > 0x0060 0030 e7dc 3c7b 797b 0000 0000 3bcc b0dc .0..<{y{....;... > 0x0070 0000 0000 3bcc b0dc 0000 0000 ....;....... > 16:00:35.270632 oberon.teja.com.603827760 > genova.teja.com.nfs: 112 getattr fh Unknown/1 (ttl 64, id 51480, len 140, bad cksum 0!) > 0x0000 4500 008c c918 0000 4011 0000 c0a8 0116 E.......@....... > 0x0010 c0a8 011e 02d9 0801 0078 840e 23fd ae30 .........x..#..0 > 0x0020 0000 0000 0000 0002 0001 86a3 0000 0002 ................ > 0x0030 0000 0001 0000 0001 0000 0028 0000 0000 ...........(.... > 0x0040 0000 0000 0000 0000 0000 01f4 0000 0005 ................ > 0x0050 0000 01f4 0000 0000 0000 000f 0000 01f5 ................ > 0x0060 0000 01f6 0000 0000 0000 0000 caba ebfe ................ > 0x0070 196f 2400 0200 0000 0a08 0000 0a08 0000 .o$............. > 0x0080 196f 2400 1947 dda1 0000 0000 .o$..G...... > 16:00:35.271486 genova.teja.com.nfs > oberon.teja.com.603827760: reply ok 96 getattr DIR 40777 ids 0/0 sz 4096 (ttl 63, id 58977, len 124) > 0x0000 4500 007c e661 0000 3f11 118b c0a8 011e E..|.a..?....... > 0x0010 c0a8 0116 0801 02d9 0068 b61b 23fd ae30 .........h..#..0 > 0x0020 0000 0001 0000 0000 0000 0000 0000 0000 ................ > 0x0030 0000 0000 0000 0000 0000 0002 0000 41ff ..............A. > 0x0040 0000 0027 0000 0000 0000 0000 0000 1000 ...'............ > 0x0050 0000 1000 0000 0000 0000 0008 0000 080a ................ > 0x0060 0024 6f19 3c7c 2189 0000 0000 3c75 9c06 .$o.<|!.....<u.. > 0x0070 0000 0000 3c75 9c06 0000 0000 ....<u...... > 16:00:35.312283 oberon.teja.com.603827761 > genova.teja.com.nfs: 112 getattr fh Unknown/1 (ttl 64, id 51481, len 140, bad cksum 0!) > 0x0000 4500 008c c919 0000 4011 0000 c0a8 0116 E.......@....... > 0x0010 c0a8 011e 02d8 0801 0078 840e 23fd ae31 .........x..#..1 > 0x0020 0000 0000 0000 0002 0001 86a3 0000 0002 ................ > 0x0030 0000 0001 0000 0001 0000 0028 0000 0000 ...........(.... > 0x0040 0000 0000 0000 0000 0000 01f4 0000 0005 ................ > 0x0050 0000 01f4 0000 0000 0000 000f 0000 01f5 ................ > 0x0060 0000 01f6 0000 0000 0000 0000 caba ebfe ................ > 0x0070 417c 0700 0200 0000 0a08 0000 0a08 0000 A|.............. > 0x0080 417c 0700 68e0 b6a5 0000 0000 A|..h....... > 16:00:35.313286 genova.teja.com.nfs > oberon.teja.com.603827761: reply ok 96 getattr DIR 40755 ids 544/500 sz 4096 (ttl 63, id 58978, len 124) > 0x0000 4500 007c e662 0000 3f11 118a c0a8 011e E..|.b..?....... > 0x0010 c0a8 0116 0801 02d8 0068 1b62 23fd ae31 .........h.b#..1 > 0x0020 0000 0001 0000 0000 0000 0000 0000 0000 ................ > 0x0030 0000 0000 0000 0000 0000 0002 0000 41ed ..............A. > 0x0040 0000 0005 0000 0220 0000 01f4 0000 1000 ................ > 0x0050 0000 1000 0000 0000 0000 0008 0000 080a ................ > 0x0060 0007 7c41 3c7c 219b 0000 0000 3c4d e10c ..|A<|!.....<M.. > 0x0070 0000 0000 3c4d e10c 0000 0000 ....<M...... > > The bad cksum messages are still there. This feels like a bug somewhere > in the output network path, and buffers are getting handed to bpf before > checksumming is complete. > > But NFS still doesn't work correctly all the time. It may just be > an amd vs Linux weirdness. My amd map is quite boring: > > /defaults type:=host;fs:=${autodir}/${rhost}/host;rhost:=${key};opts:=rw,grpid,resvpor t,nosuid,nodev,intr,hard > * opts:=rw.grpid,resvport,nosuid,nodev,intr,hard,version=2 > > Pointers? Ideas where I should start digging when I see this happen > again? > > Thx, > > -- > Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.8" / 37N 20' 14.9" > Internet: steve @ Watt.COM Whois: SW32 > Free time? There's no such thing. It just comes in varying prices... > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00f601c1bf25$da7b1880$34f820c0>