From owner-freebsd-current@FreeBSD.ORG Fri Nov 9 18:24:25 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 19DB416A418 for ; Fri, 9 Nov 2007 18:24:25 +0000 (UTC) (envelope-from gwright@antiope.com) Received: from QMTA07.westchester.pa.mail.comcast.net (qmta07.westchester.pa.mail.comcast.net [76.96.62.64]) by mx1.freebsd.org (Postfix) with ESMTP id 7A36413C480 for ; Fri, 9 Nov 2007 18:24:24 +0000 (UTC) (envelope-from gwright@antiope.com) Received: from OMTA14.westchester.pa.mail.comcast.net ([76.96.62.60]) by QMTA07.westchester.pa.mail.comcast.net with smtp id AbYy1Y0021HzFnQ000XB00; Fri, 09 Nov 2007 18:07:40 +0000 Received: from [192.168.2.42] ([72.88.201.27]) by OMTA14.westchester.pa.mail.comcast.net with comcast id Ai7g1Y00A0byEVU0000000; Fri, 09 Nov 2007 18:07:40 +0000 X-Authority-Analysis: v=1.0 c=1 a=LazWwxOxDN5Hx0GHfLEA:9 a=ttj9x9bdBLHpi_Uaco0A:7 a=sQ4Vij4K__SKml_mgEfxiYclbFEA:4 a=27vHs-B1mqoA:10 a=cB8LuYKA45AA:10 a=Y_EUdd4fNZkA:10 a=LROfrJfmX20A:10 a=HgpzGGeNr9EA:10 a=FKFaWgXJAKAA:10 a=uq-ktza2FJDFprLDFyUA:9 a=CLfVMVzW3ztS-fvE9FYA:7 a=QZU0g1EMUOuHTkrW3etf_UrSEOEA:4 a=Sz-0p1zU2dQA:10 Mime-Version: 1.0 (Apple Message framework v752.2) To: freebsd-current@freebsd.org Message-Id: <46B41421-3112-40C6-84D9-094FA771F93E@antiope.com> From: Gregory Wright Date: Fri, 9 Nov 2007 13:07:39 -0500 X-Mailer: Apple Mail (2.752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: excessive TCP dulplicate acks revisted X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Nov 2007 18:24:25 -0000 (Note: long message) Hi, The tcp duplicate ACK attack is back. Last March, there was a thread on duplicate TCP acks in -CURRENT. I have been able to reproduce the problem on 7.0-BETA2 (amd64) and have some new information that might help locate the bug. Background: I first noticed problems with tcp connections dropping when Bacula was running on our backup server. The hardware was a dual Opteron 244, 2 GB RAM, running FreeBSD 6.2-RELEASE-p2. The ethernet NIC was a bge. We went through an extensive process to rule out hardware and cabling problems: the server hardware was replaced with a single Opteron 270 (dual core) on a Tyan S2882-D motherboard. The memory was replaced as well. The NICs are still bge. This machine is "hardtack". We have another server in the same rack, a dual Opteron 2214, 4 GB RAM, running FreeBSD 6.2-RELEASE-p1. It has never shown the "dropped connection" problem. This machine is "greenhouse-george" All of the machines are on a single LAN, with a Nortel BayStack 450-24T ethernet switch. Experiments: The first experiment was to use netperf to send a tcp stream from either a PowerBook G4 (OS X 10.4.10) or a dual Intel Clovertown box running FreeBSD 6.2-RELEASE-p8. This is from the PowerBook: crossroads-able> netperf -H greenhouse-george TCP STREAM TEST from localhost (0.0.0.0) port 0 AF_INET to greenhouse- george.18clay.com (192.168.2.63) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 262144 262144 10.02 93.77 crossroads-able> netperf -H hardtack TCP STREAM TEST from localhost (0.0.0.0) port 0 AF_INET to hardtack. 18clay.com (192.168.2.61) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 262144 262144 10.10 29.24 crossroads-able> (That there is a problem in the second case is easy to see just from the ethernet switch: the activity lights blink on and off. A wireshark trace shows a duplicate ACK storm and gaps in the transmitted packet stream.) Just to be certain that this isn't a weird cabling problem, I swapped the cables to the two machines: crossroads-able> netperf -H greenhouse-george TCP STREAM TEST from localhost (0.0.0.0) port 0 AF_INET to greenhouse- george.18clay.com (192.168.2.63) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 262144 262144 10.02 92.68 crossroads-able> netperf -H hardtack TCP STREAM TEST from localhost (0.0.0.0) port 0 AF_INET to hardtack. 18clay.com (192.168.2.61) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 262144 262144 10.02 30.86 crossroads-able> The low throughput (and gaps in the tcp stream) stay with the machine rather than moving with the cable. When I use my dual Clovertown box (FreeBSD 6.2-RELEASE-p8, em NIC), as a source, I get ivy-mike# netperf -H greenhouse-george TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to greenhouse- george.18clay.com (192.168.2.63) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 32768 32768 10.58 94.14 ivy-mike# netperf -H hardtack TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to hardtack. 18clay.com (192.168.2.61) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 32768 32768 10.57 89.15 ivy-mike# FreeBSD seems not as disturbed by the extra ACKs as OS X, but I still see the activity lights on the switch blinking, indicating dropouts in the tcp stream. Also, longer tests using either source machine almost always result in failure ("Broken Pipe") after a few minutes. Now is where things get interesting. If I connect the PowerBook to hardtack (the machine showing the bug) with just a cable, bypassing the switch, I get crossroads-able> netperf -H 192.168.2.2 -l 300 TCP STREAM TEST from (null) (0.0.0.0) port 0 AF_INET to (null) (192.168.2.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 262144 262144 300.54 330.45 crossroads-able> The higher throughput is because I have two GbE interfaces directly connected, instead of going through the 10/100 Mb switch. The key thing is that I can run this test for hours without the connection dropping. (I have run it for as long as 10000 seconds.) And the wireshark log shows no duplicate ACK storm. Is there perhaps something wrong with the switch? I swapped out our BayStack 450-24T for an identical unit with the same results. I also have an unmanaged SMC EZ Switch 10/100, so I tried a minimal setup: PowerBook <----> SMC switch <----> FreeBSD 7.0-BETA2 server (hardtack) Only the PowerBook and FreeBSD server are connected to the switch. A netperf tcp streaming test gives: crossroads-able> netperf -H 192.168.2.2 TCP STREAM TEST from (null) (0.0.0.0) port 0 AF_INET to (null) (192.168.2.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 262144 262144 10.02 24.25 crossroads-able> The throughput is low, but most importantly, the wireshark log shows that the duplicate ACK storm is back. So is there something coming _out_ of the switch that FreeBSD's network stack doesn't like? No, this doesn't seem to be the case. The old SMC switch doesn't even send out spanning tree packets. Next experiment: netperf tcp stream from the PowerBook, but with the interface forced into 100TX full duplex. Result: crossroads-able> netperf -H 192.168.2.2 TCP STREAM TEST from (null) (0.0.0.0) port 0 AF_INET to (null) (192.168.2.2) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 65536 262144 262144 10.02 33.81 crossroads-able> So the throughput has dropped and the wireshark log shows duplicate ACK storms again. If I force the PowerBook to 100TX half duplex the wireshark log still shows duplicate ACK storms. And at 10 Mb/s, full or half duplex, I see still bursts of duplicate ACKs. One last thing to note is that I can run netperf tests _from_ the buggy machine to the laptop with no trouble. Again forcing the PowerBook interface to 100TX full duplex, TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.1 (192.168.2.1) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 262144 32768 32768 10.02 93.03 hardtack# The wireshark logs show no duplicate ACKs. One interesting thing is that the packet logs for the reverse (properly working) case show a more regular structure of data and ACK packets. In the forward (buggy) direction, the data packets are burstier, and to my eye more irregular. This was mentioned in passing in the original thread, in which someone speculated that there might be a window problem. So, in sum: The duplicate ack problem happens on my FreeBSD 7.0-BETA2 machine (single Opteron 270, Tyan S2882-D, 2 GB RAM) when the bge interface in operating in 10 or 100 Mb/s mode, but not in Gb mode. It does not happen on my FreeBSD 6.2-RELEASE-p1 box (dual Opteron 2214, Tyan 3992G3NR, 4 GB RAM) which has a bge NIC as well. Nor does it happen on my dual Clovertown box (dual E5345, Tyan S2696A2NRF, 4 GB) with em NIC. The buggy machine is not much use to me until this problem can be fixed, so I can dedicate it to debugging for a while. Any hints about where code might be instrumented to track the bug down or new experiments to perform are welcome. I can provide wireshark logs; compressed they are not too large (a few MB). Here is the dmesg of the buggy machine: hardtack# dmesg Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-BETA2 #0: Fri Nov 2 14:54:38 UTC 2007 root@myers.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Dual Core AMD Opteron(tm) Processor 270 HE (1993.41-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x20f12 Stepping = 2 Features=0x178bfbff Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x3 Cores per package: 2 usable memory = 2134962176 (2036 MB) avail memory = 2060251136 (1964 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-27 on motherboard ioapic2 irqs 28-31 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 7ff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 acpi_hpet0: iomem 0xfec01000-0xfec013ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 cpu0: on acpi0 acpi_throttle0: on cpu0 powernow0: on cpu0 cpu1: on acpi0 powernow1: on cpu1 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 6.0 on pci0 pci3: on pcib1 ohci0: mem 0xfeafb000-0xfeafbfff irq 19 at device 0.0 on pci3 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 3 ports with 3 removable, self powered ohci1: mem 0xfeafc000-0xfeafcfff irq 19 at device 0.1 on pci3 ohci1: [GIANT-LOCKED] ohci1: [ITHREAD] usb1: OHCI version 1.0, legacy support usb1: on ohci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 3 ports with 3 removable, self powered atapci0: port 0xbc00-0xbc07,0xb880-0xb883,0xb800-0xb807,0xac00-0xac03,0xa880-0xa88f mem 0xfeafec00-0xfeafefff irq 19 at device 5.0 on pci3 atapci0: [ITHREAD] ata2: on atapci0 ata2: [ITHREAD] ata3: on atapci0 ata3: [ITHREAD] ata4: on atapci0 ata4: [ITHREAD] ata5: on atapci0 ata5: [ITHREAD] vgapci0: port 0xb000-0xb0ff mem 0xfd000000-0xfdffffff,0xfeaff000-0xfeafffff irq 18 at device 6.0 on pci3 fxp0: port 0xa800-0xa83f mem 0xfeafa000-0xfeafafff,0xfeaa0000-0xfeabffff irq 18 at device 8.0 on pci3 miibus0: on fxp0 inphy0: PHY 1 on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:e0:81:4b:3e:39 fxp0: [ITHREAD] isab0: at device 7.0 on pci0 isa0: on isab0 atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0 ata0: on atapci1 ata0: [ITHREAD] ata1: on atapci1 ata1: [ITHREAD] pci0: at device 7.2 (no driver attached) pci0: at device 7.3 (no driver attached) pcib2: at device 10.0 on pci0 pci2: on pcib2 ahc0: port 0x8400-0x84ff mem 0xfc8fd000-0xfc8fdfff irq 27 at device 3.0 on pci2 ahc0: [ITHREAD] aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs ahc1: port 0x8800-0x88ff mem 0xfc8fe000-0xfc8fefff irq 24 at device 3.1 on pci2 ahc1: [ITHREAD] aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs pci0:2:9:0: bad VPD cksum, remain 72 bge0: mem 0xfc870000-0xfc87ffff irq 24 at device 9.0 on pci2 miibus1: on bge0 brgphy0: PHY 1 on miibus1 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:e0:81:4b:3e:ae bge0: [ITHREAD] pci0:2:9:1: bad VPD cksum, remain 72 bge1: mem 0xfc890000-0xfc89ffff irq 25 at device 9.1 on pci2 miibus2: on bge1 brgphy1: PHY 1 on miibus2 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge1: Ethernet address: 00:e0:81:4b:3e:af bge1: [ITHREAD] pcib3: at device 11.0 on pci0 pci1: on pcib3 acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model IntelliMouse, device ID 3 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xcc7ff on isa0 ppc0: cannot reserve I/O port range sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: DVDROM at ata1-master UDMA33 ad4: 305245MB at ata2-master SATA150 Waiting 5 seconds for SCSI devices to settle sa0 at ahc1 bus 0 target 6 lun 0 sa0: Removable Sequential Access SCSI-3 device sa0: 160.000MB/s transfers (80.000MHz DT, offset 31, 16bit) SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/ad4s1a WARNING: / was not properly dismounted ch0 at ahc1 bus 0 target 5 lun 0 ch0: Removable Changer SCSI-2 device ch0: 3.300MB/s transfers ch0: 8 slots, 1 drive, 1 picker, 0 portals bge1: link state changed to DOWN bge1: link state changed to UP bge1: promiscuous mode enabled bge1: promiscuous mode disabled bge1: link state changed to DOWN bge1: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP hardtack# Please let me know what I can do to track this down. Best Wishes, Greg Gregory Wright Antiope Associates LLC 18 Clay Street Fair Haven, New Jersey 07704 USA 1 (732) 924-4549 1 (732) 345-8378 [fax]