Date: Mon, 30 Nov 2009 09:26:00 +0100 From: =?iso-8859-1?Q?Eirik_=D8verby?= <ltning@anduin.net> To: Adrian Chadd <adrian@freebsd.org> Cc: pyunyh@gmail.com, weldon@excelsusphoto.com, freebsd-current@freebsd.org, Robert Watson <rwatson@freebsd.org>, Gavin Atkinson <gavin@freebsd.org> Subject: Re: FreeBSD 8.0 - network stack crashes? Message-ID: <C3CC7F37-10BE-41DD-96E4-C952C6434ACC@anduin.net> In-Reply-To: <d763ac660911292347i74caba25h9861a4d9feb63d77@mail.gmail.com> References: <A1648B95-F36D-459D-BBC4-FFCA63FC1E4C@anduin.net> <20091129013026.GA1355@michelle.cdnetworks.com> <74BFE523-4BB3-4748-98BA-71FBD9829CD5@anduin.net> <alpine.BSF.2.00.0911291427240.80654@fledge.watson.org> <E9B13DDC-1B51-4EFD-95D2-544238BDF3A4@anduin.net> <d763ac660911292347i74caba25h9861a4d9feb63d77@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 30. nov. 2009, at 08.47, Adrian Chadd wrote: > That URL works for me. So how much traffic is this box handling during > peak times? Depends how you define load. It's a storage box (14TB ZFS) with a small = handful of NFS clients pushing backup data to it .. So lots of traffic = in bytes/sec, but not many clients. > I've seen this on the proxy boxes that I've setup. There's a lot of > data being tied up in socket buffers as well as being routed between > interfaces (ie, stuff that isn't being intercepted.) Take a look at > "netstat -an" when things are locked up; see if there's any sockets > which have full send/receive queues. If you're referring to the Send-Q and Recv-Q values, they are zero = everywhere I can tell. > I'm going to take a complete stab in the dark here and say this sounds > a little like a livelock. Ie, something is queuing data and allocating > mbufs for TX (and something else is generating mbufs - I dunno, packet > headers?) far faster than the NIC is able to TX them out, and there's > not enough backpressure on whatever (say, the stuff filling socket > buffers) to stop the mbuf exhaustion. Again, I've seen this kind of > crap on proxy boxes. Not sure if this applies in our case. See the (very) end of this mail = for some debug/stats output from em1 (the interface currently in use; I = disabled lagg/lacp to ease debugging). > See if you have full socket buffers showing up in netstat -an. Have > you tweaked the socket/TCP send/receive sizes? I typically lock mine > down to something small (32k-64k for the most part) so I don't hit > mbuf exhaustion on very busy proxies. I haven't touched any defaults except the mbuf clusters. What does your = sysctl.conf look like? Thanks, /Eirik > 2c, >=20 >=20 >=20 > Adrian >=20 > 2009/11/30 Eirik =D8verby <ltning@anduin.net>: >> On 29. nov. 2009, at 15.29, Robert Watson wrote: >>=20 >>> On Sun, 29 Nov 2009, Eirik =D8verby wrote: >>>=20 >>>> I just did that (-rxcsum -txcsum -tso), but the numbers still keep = rising. I'll wait and see if it goes down again, then reboot with those = values to see how it behaves. But right away it doesn't look too good .. >>>=20 >>> It would be interesting to know if any of the counters in the output = of netstat -s grow linearly with the allocation count in netstat -m. = Often times leaks are associated with edge cases in the stack (typically = because if they are in common cases the bug is detected really quickly!) = -- usually error handling, where in some error case the unwinding fails = to free an mbuf that it should free. These are notoriously hard to = track down, unfortunately, but the stats output (especially where delta = alloc is linear to delta stat) may inform the situation some more. >>=20 >> =46rom what I can tell, all that goes up with mbuf usage is = traffic/packet counts. I can't say I see anything fishy in there. >>=20 >> =46rom the last few samples in >> http://anduin.net/~ltning/netstat.log >> you can see the host stops receiving any packets, but does a few = retransmits before the session where this script ran timed out. >>=20 >> /Eirik >>=20 >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org" >>=20 >=20 em1: link state changed to UP em1: Adapter hardware address =3D 0xffffff80003be530=20 em1: CTRL =3D 0x140248 RCTL =3D 0x8002=20 em1: Packet buffer =3D Tx=3D20k Rx=3D12k=20 em1: Flow control watermarks high =3D 10240 low =3D 8740 em1: tx_int_delay =3D 66, tx_abs_int_delay =3D 66 em1: rx_int_delay =3D 32, rx_abs_int_delay =3D 66 em1: fifo workaround =3D 0, fifo_reset_count =3D 0 em1: hw tdh =3D 25, hw tdt =3D 25 em1: hw rdh =3D 222, hw rdt =3D 221 em1: Num Tx descriptors avail =3D 256 em1: Tx Descriptors not avail1 =3D 0 em1: Tx Descriptors not avail2 =3D 0 em1: Std mbuf failed =3D 0 em1: Std mbuf cluster failed =3D 0 em1: Driver dropped packets =3D 0 em1: Driver tx dma failure in encap =3D 0 em1: Excessive collisions =3D 0 em1: Sequence errors =3D 0 em1: Defer count =3D 0 em1: Missed Packets =3D 0 em1: Receive No Buffers =3D 0 em1: Receive Length Errors =3D 0 em1: Receive errors =3D 0 em1: Crc errors =3D 0 em1: Alignment errors =3D 0 em1: Collision/Carrier extension errors =3D 0 em1: RX overruns =3D 0 em1: watchdog timeouts =3D 0 em1: RX MSIX IRQ =3D 0 TX MSIX IRQ =3D 0 LINK MSIX IRQ =3D 0 em1: XON Rcvd =3D 0 em1: XON Xmtd =3D 0 em1: XOFF Rcvd =3D 0 em1: XOFF Xmtd =3D 0 em1: Good Packets Rcvd =3D 5704113 em1: Good Packets Xmtd =3D 3617612 em1: TSO Contexts Xmtd =3D 0 em1: TSO Contexts Failed =3D 0
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C3CC7F37-10BE-41DD-96E4-C952C6434ACC>