Date: Thu, 06 Mar 2014 22:45:55 +0100 From: "Dr. A. Haakh" <bugReporter@Haakh.de> To: freebsd-net@freebsd.org Subject: Re: 9.2 ixgbe tx queue hang Message-ID: <5318EC93.4000303@Haakh.de> In-Reply-To: <02AD7510-C862-4C29-9420-25ABF1A6E744@hostpoint.ch> References: <9C5B43BD-9D80-49EA-8EDC-C7EF53D79C8D@hostpoint.ch> <CAFOYbcmrVms7VJmPCZHCTMDvBfsV775aDFkHhMrGAEAtPx8-Mw@mail.gmail.com> <02AD7510-C862-4C29-9420-25ABF1A6E744@hostpoint.ch>
next in thread | previous in thread | raw e-mail | index | archive | help
Markus Gebert schrieb: > On 06.03.2014, at 19:33, Jack Vogel <jfvogel@gmail.com> wrote: > >> You did not make it explicit before, but I noticed in your dtrace info that >> you are using >> lagg, its been the source of lots of problems, so take it out of the setup >> and see if this >> queue problem still happens please. >> >> Jack > Well, last year when upgrading another batch of servers (same hardware) to 9.2, we tried find a solution to this network problem, and we eliminated lagg where we had used it before, which did not help at all. That’s why I didn’t mention it explicitly. > > My point is, I can confirm that 9.2 has network problems on this same hardware with or without lagg, so it’s unlikely that removing it will bring immediate success. OTOH, I didn’t have this tx queue theory back then, so I cannot be sure that what we saw then without lagg, and what we see now with lagg, really are the same problem. > > I guess, for the sake of simplicity I will remove lagg on these new systems. But before I do that, to save time, I wanted to ask wether I should remove vlan interfaces too? While that didn’t help either last year, my guess is that I should take them out of the picture, unless you say otherwise. > > Thanks for looking into this. > > > Markus > I don't use ixgbe but this might be related to the discussed problem. I too realized network problems when I moved from 9.1 to 9.2 last october. Occasionally I use vlc to watch tv on udp://@224.0.0.1:7792 coming from an XP-system which displayed perfect on 9.1 but got scrambled on 9.2. By accident I realized that vlc worked fine again, when I had a cpu-intensiv job like portupgrade -a running. So I thought it might be a problem related to the scheduler. In the meantime I upgraded to 10.0-STABLE and things looks better now -- it still takes about 20 seconds for a video-stream get synchronized. My system is CPU: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz (2675.02-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x106e5 Family = 0x6 Model = 0x1e Stepping = 5 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x98e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant, performance statistics real memory = 12884901888 (12288 MB) avail memory = 12438151168 (11861 MB) with this ethernet-card re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xd800-0xd8ff mem 0xf6fff000-0xf6ffffff,0xf6ff8000-0xf6ffbfff irq 19 at device 0.0 on pci2 re0: Using 1 MSI-X message re0: Chip rev. 0x28000000 re0: MAC rev. 0x00300000 miibus0: <MII bus> on re0 rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow re0: Ethernet address: 90:e6:ba:bb:28:3e Andreas > >> On Thu, Mar 6, 2014 at 2:24 AM, Markus Gebert <markus.gebert@hostpoint.ch>wrote: >> >>> (creating a new thread, because I'm no longer sure this is related to >>> Johan's thread that I originally used to discuss this) >>> >>> On 27.02.2014, at 18:02, Jack Vogel <jfvogel@gmail.com> wrote: >>> >>>> I would make SURE that you have enough mbuf resources of whatever size >>> pool >>>> that you are >>>> using (2, 4, 9K), and I would try the code in HEAD if you had not. >>>> >>>> Jack >>> Jack, we've upgraded some other systems on which I get more time to debug >>> (no impact for customers). Although those systems use the nfsclient too, I >>> no longer think that NFS is the source of the problem (hence the new >>> thread). I think it's the ixgbe driver and/or card. When our problem >>> occurs, it looks like it's a single tx queue that gets stuck somehow (its >>> buf_ring remains full). >>> >>> I tracked ping using dtrace to determine the source of ENOBUFS it returns >>> every few packets when things get weird: >>> >>> # dtrace -n 'fbt:::return / arg1 == ENOBUFS && execname == "ping" / { >>> stack(); }' >>> dtrace: description 'fbt:::return ' matched 25476 probes >>> CPU ID FUNCTION:NAME >>> 26 7730 ixgbe_mq_start:return >>> if_lagg.ko`lagg_transmit+0xc4 >>> kernel`ether_output_frame+0x33 >>> kernel`ether_output+0x4fe >>> kernel`ip_output+0xd74 >>> kernel`rip_output+0x229 >>> kernel`sosend_generic+0x3f6 >>> kernel`kern_sendit+0x1a3 >>> kernel`sendit+0xdc >>> kernel`sys_sendto+0x4d >>> kernel`amd64_syscall+0x5ea >>> kernel`0xffffffff80d35667 >>> >>> >>> >>> The only way ixgbe_mq_start could return ENOBUFS would be when >>> drbr_enqueue() encouters a full tx buf_ring. Since a new ping packet >>> probably has no flow id, it should be assigned to a queue based on curcpu, >>> which made me try to pin ping to single cpus to check wether it's always >>> the same tx buf_ring that reports being full. This turned out to be true: >>> >>> # cpuset -l 0 ping 10.0.4.5 >>> PING 10.0.4.5 (10.0.4.5): 56 data bytes >>> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.347 ms >>> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.135 ms >>> >>> # cpuset -l 1 ping 10.0.4.5 >>> PING 10.0.4.5 (10.0.4.5): 56 data bytes >>> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.184 ms >>> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.232 ms >>> >>> # cpuset -l 2 ping 10.0.4.5 >>> PING 10.0.4.5 (10.0.4.5): 56 data bytes >>> ping: sendto: No buffer space available >>> ping: sendto: No buffer space available >>> ping: sendto: No buffer space available >>> ping: sendto: No buffer space available >>> ping: sendto: No buffer space available >>> >>> # cpuset -l 3 ping 10.0.4.5 >>> PING 10.0.4.5 (10.0.4.5): 56 data bytes >>> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.130 ms >>> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.126 ms >>> [...snip...] >>> >>> The system has 32 cores, if ping runs on cpu 2, 10, 18 or 26, which use >>> the third tx buf_ring, ping reliably return ENOBUFS. If ping is run on any >>> other cpu using any other tx queue, it runs without any packet loss. >>> >>> So, when ENOBUFS is returned, this is not due to an mbuf shortage, it's >>> because the buf_ring is full. Not surprisingly, netstat -m looks pretty >>> normal: >>> >>> # netstat -m >>> 38622/11823/50445 mbufs in use (current/cache/total) >>> 32856/11642/44498/132096 mbuf clusters in use (current/cache/total/max) >>> 32824/6344 mbuf+clusters out of packet secondary zone in use >>> (current/cache) >>> 16/3906/3922/66048 4k (page size) jumbo clusters in use >>> (current/cache/total/max) >>> 0/0/0/33024 9k jumbo clusters in use (current/cache/total/max) >>> 0/0/0/16512 16k jumbo clusters in use (current/cache/total/max) >>> 75431K/41863K/117295K bytes allocated to network (current/cache/total) >>> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) >>> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) >>> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) >>> 0/0/0 requests for jumbo clusters denied (4k/9k/16k) >>> 0/0/0 sfbufs in use (current/peak/max) >>> 0 requests for sfbufs denied >>> 0 requests for sfbufs delayed >>> 0 requests for I/O initiated by sendfile >>> 0 calls to protocol drain routines >>> >>> In the meantime I've checked the commit log of the ixgbe driver in HEAD >>> and besides there are little differences between HEAD and 9.2, I don't see >>> a commit that fixes anything related to what were seeing... >>> >>> So, what's the conclusion here? Firmware bug that's only triggered under >>> 9.2? Driver bug introduced between 9.1 and 9.2 when new multiqueue stuff >>> was added? Jack, how should we proceed? >>> >>> >>> Markus >>> >>> >>> >>> On Thu, Feb 27, 2014 at 8:05 AM, Markus Gebert >>> <markus.gebert@hostpoint.ch>wrote: >>> >>>> On 27.02.2014, at 02:00, Rick Macklem <rmacklem@uoguelph.ca> wrote: >>>> >>>>> John Baldwin wrote: >>>>>> On Tuesday, February 25, 2014 2:19:01 am Johan Kooijman wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I have a weird situation here where I can't get my head around. >>>>>>> >>>>>>> One FreeBSD 9.2-STABLE ZFS/NFS box, multiple Linux clients. Once in >>>>>>> a while >>>>>>> the Linux clients loose their NFS connection: >>>>>>> >>>>>>> Feb 25 06:24:09 hv3 kernel: nfs: server 10.0.24.1 not responding, >>>>>>> timed out >>>>>>> >>>>>>> Not all boxes, just one out of the cluster. The weird part is that >>>>>>> when I >>>>>>> try to ping a Linux client from the FreeBSD box, I have between 10 >>>>>>> and 30% >>>>>>> packetloss - all day long, no specific timeframe. If I ping the >>>>>>> Linux >>>>>>> clients - no loss. If I ping back from the Linux clients to FBSD >>>>>>> box - no >>>>>>> loss. >>>>>>> >>>>>>> The errors I get when pinging a Linux client is this one: >>>>>>> ping: sendto: File too large >>>> We were facing similar problems when upgrading to 9.2 and have stayed >>> with >>>> 9.1 on affected systems for now. We've seen this on HP G8 blades with >>>> 82599EB controllers: >>>> >>>> ix0@pci0:4:0:0: class=0x020000 card=0x18d0103c chip=0x10f88086 rev=0x01 >>>> hdr=0x00 >>>> vendor = 'Intel Corporation' >>>> device = '82599EB 10 Gigabit Dual Port Backplane Connection' >>>> class = network >>>> subclass = ethernet >>>> >>>> We didn't find a way to trigger the problem reliably. But when it occurs, >>>> it usually affects only one interface. Symptoms include: >>>> >>>> - socket functions return the 'File too large' error mentioned by Johan >>>> - socket functions return 'No buffer space' available >>>> - heavy to full packet loss on the affected interface >>>> - "stuck" TCP connection, i.e. ESTABLISHED TCP connections that should >>>> have timed out stick around forever (socket on the other side could have >>>> been closed ours ago) >>>> - userland programs using the corresponding sockets usually got stuck too >>>> (can't find kernel traces right now, but always in network related >>> syscalls) >>>> Network is only lightly loaded on the affected systems (usually 5-20 >>> mbit, >>>> capped at 200 mbit, per server), and netstat never showed any indication >>> of >>>> ressource shortage (like mbufs). >>>> >>>> What made the problem go away temporariliy was to ifconfig down/up the >>>> affected interface. >>>> >>>> We tested a 9.2 kernel with the 9.1 ixgbe driver, which was not really >>>> stable. Also, we tested a few revisions between 9.1 and 9.2 to find out >>>> when the problem started. Unfortunately, the ixgbe driver turned out to >>> be >>>> mostly unstable on our systems between these releases, worse than on 9.2. >>>> The instability was introduced shortly after to 9.1 and fixed only very >>>> shortly before 9.2 release. So no luck there. We ended up using 9.1 with >>>> backports of 9.2 features we really need. >>>> >>>> What we can't tell is wether it's the 9.2 kernel or the 9.2 ixgbe driver >>>> or a combination of both that causes these problems. Unfortunately we ran >>>> out of time (and ideas). >>>> >>>> >>>>>> EFBIG is sometimes used for drivers when a packet takes too many >>>>>> scatter/gather entries. Since you mentioned NFS, one thing you can >>>>>> try is to >>>>>> disable TSO on the intertface you are using for NFS to see if that >>>>>> "fixes" it. >>>>>> >>>>> And please email if you try it and let us know if it helps. >>>>> >>>>> I've think I've figured out how 64K NFS read replies can do this, >>>>> but I'll admit "ping" is a mystery? (Doesn't it just send a single >>>>> packet that would be in a single mbuf?) >>>>> >>>>> I think the EFBIG is replied by bus_dmamap_load_mbuf_sg(), but I >>>>> don't know if it can happen for an mbuf chain with < 32 entries? >>>> We don't use the nfs server on our systems, but they're (new)nfsclients. >>>> So I don't think our problem is nfs related, unless the default >>> rsize/wsize >>>> for client mounts is not 8K, which I thought it was. Can you confirm >>> this, >>>> Rick? >>>> >>>> IIRC, disabling TSO did not make any difference in our case. >>>> >>>> >>>> Markus >>>> >>>> >>> >>> >>> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5318EC93.4000303>