Date: Thu, 4 May 2017 18:10:55 +0200 From: Olavi Kumpulainen <olavi.m.kumpulainen@gmail.com> To: freebsd-arm@freebsd.org Subject: cpsw drops packets when stressed on BBB and 11.0-STABLE Message-ID: <FBE5F028-131F-4E88-A0AE-5A9C092327B8@gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, I'm running a snapshot build of FreeBSD-11, FreeBSD beaglebone 11.0-STABLE FreeBSD 11.0-STABLE #0 r317153: Thu Apr = 20 09:21:26 UTC 2017 =20 on a BBB. I see that cpsw drops outgoing packets when stressed. =20 Out of some reason, dev.cpswss.0.stats.RxStartOfFrameOverruns increments = when packets are dropped which may be a hint on what=E2=80=99s going on.=20= The fact that RxStartOf... increases is confusing, because the packets = seem to be dropped in transmission.=20 Anyway - I=E2=80=99ve found a simple way to reproduce the problem, = namely by sending long pings. On the BBB: # tcpdump -ni cpsw0 icmp&=20 Initial state of RxStartOfFrameOverruns in BBB after playing around a = bit: # sysctl dev.cpswss.0.stats.RxStartOfFrameOverruns=20 dev.cpswss.0.stats.RxStartOfFrameOverruns: 86 =20 # ping -c 1 -s 14000 192.168.0.3 PING 192.168.0.3 (192.168.0.3): 14000 data bytes 11:36:57.965980 IP 192.168.0.158 > 192.168.0.3: ICMP echo request, id = 53762, seq 0, length 1480 11:36:57.966658 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 11:36:57.966826 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 11:36:57.966923 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 11:36:57.967009 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 11:36:57.967090 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 11:36:57.967173 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 11:36:57.967254 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 11:36:57.967336 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 11:36:57.967414 IP 192.168.0.158 > 192.168.0.3: ip-proto-1=20 (10 packets has supposedly been put into the tx ring in BBB)=20 Looking at RxStartOfFrameOverruns in the BBB, I see an increment by 5=E2=80= =A6 #sysctl dev.cpswss.0.stats.RxStartOfFrameOverruns=20 dev.cpswss.0.stats.RxStartOfFrameOverruns: 91 =20 I've set up a tcpdump on the target machine: $ sudo tcpdump -ni eth2 icmp 13:52:42.603199 IP 192.168.0.158 > 192.168.0.3: ICMP echo request, id = 53762, seq 0, length 1480 13:52:42.604697 IP 192.168.0.158 > 192.168.0.3: ip-proto-1=20 (Eight fragments lost!) Without tcpump in BBB, more packets seem to go through (showing tcpdump = on target); 13:56:08.396553 IP 192.168.0.158 > 192.168.0.3: ICMP echo request, id = 55554, seq 0, length 1480 13:56:08.397781 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 13:56:08.399029 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 13:56:08.400157 IP 192.168.0.158 > 192.168.0.3: ip-proto-1 13:56:08.401409 IP 192.168.0.158 > 192.168.0.3: ip-proto-1=20 (Five packets lost)=20 Again, there's an increment in RxStartOfFrame...: # sysctl dev.cpswss.0.stats.RxStartOfFrameOverruns=20 dev.cpswss.0.stats.RxStartOfFrameOverruns: 96 =20 I added a printf in tx_enqueue() in an attempt to see what=E2=80=99s = going on, but doing so =E2=80=9Cfixed the bug=E2=80=9D =E2=80=93 = obviously by adding a delay in the forwarding code. Maybe we have a = timing/race between the driver and the cpsw hardware? Also, I tested sending 14k pings from the standard-installed Linux in = the BBB and that worked just fine. So the packets aren't lost between = the hosts (the machines are connected via the same switch). Any ideas? Cheers - /Olavi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FBE5F028-131F-4E88-A0AE-5A9C092327B8>