Date: Sat, 11 Nov 2006 09:08:37 +0200 From: "Clayton Milos" <clay@milos.co.za> To: <freebsd-stable@freebsd.org> Cc: freebsd-net <freebsd-net@freebsd.org>, glebius@freebsd.org, Jack Vogel <jfvogel@gmail.com> Subject: Re: Proposed 6.2 em RELEASE patch Message-ID: <007d01c70560$356577b0$9603a8c0@claylaptop> References: <2a41acea0611081719h31be096eu614d2f2325aff511@mail.gmail.com><200611091536.kA9FaltD018819@lava.sentex.ca><45534E76.6020906@samsco.org><200611092200.kA9M0q1E020473@lava.sentex.ca><200611102004.kAAK4iO9027778@lava.sentex.ca><2a41acea0611101400w5b8cef40ob84ed6de181f3e2c@mail.gmail.com><200611102221.kAAML6ol028630@lava.sentex.ca> <455570D8.6070000@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- From: "Scott Long" <scottl@samsco.org> To: "Mike Tancsa" <mike@sentex.net> Cc: "freebsd-net" <freebsd-net@freebsd.org>; <glebius@freebsd.org>; <freebsd-stable@freebsd.org>; "Jack Vogel" <jfvogel@gmail.com> Sent: Saturday, November 11, 2006 8:42 AM Subject: Re: Proposed 6.2 em RELEASE patch > Mike Tancsa wrote: >> At 05:00 PM 11/10/2006, Jack Vogel wrote: >>> On 11/10/06, Mike Tancsa <mike@sentex.net> wrote: >>>> >>>> Some more tests. I tried again with what was committed to today's >>>> RELENG_6. I am guessing its pretty well the same patch. Polling is >>>> the only way to avoid livelock at a high pps rate. Does anyone know >>>> of any simple tools to measure end to end packet loss ? Polling will >>>> end up dropping some packets and I want to be able to compare. Same >>>> hardware from the previous post. >>> >>> The commit WAS the last patch I posted. SO, making sure I understood >>> you, >>> you are saying that POLLING is doing better than FAST_INTR, or only >>> better than the legacy code that went in with my merge? >> >> Hi, >> The last set of tests I posted are ONLY with what is in today's >> RELENG_6-- i.e. the latest commit. I did a few variations on the driver-- >> first with >> #define EM_FAST_INTR 1 >> in if_em.c >> >> one without >> >> and one with polling in the kernel. >> >> With a decent packet rate passing through, the box will lockup. Not sure >> if I am just hitting the limits of the PCIe bus, or interrupt moderation >> is not kicking in, or this is a case of "Doctor, it hurts when I send a >> lot of packets through"... "Well, dont do that" >> >> Using polling prevents the lockup, but it will of course drop packets. >> This is for firewalls with a fairly high bandwidth rate, as well as I >> need it to be able to survive a decent DDoS attack. I am not looking for >> 1Mpps, but something more than 100Kpps >> >> ---Mike > > Hi, > > Thanks for all of the data. I know that a good amount of testing was > done with single stream stress tests, but it's not clear how much was > done with multiple streams prior to your efforts. So, I'm not terribly > surprised by your results. I'm still a bit unclear on the exact > topology of your setup, so if could explain it some more in private > email, I'd appreciate it. > > For the short term, I don't think that there is anything that can be > magically tweaked that will safely give better results. I know that > Gleb has some ideas on a fairly simple change for the non-INTR_FAST, > non-POLLING case, but I and several others worry that it's not robust > in the face of real-world network problems. > > For the long term, I have a number of ideas for improving both the RX > and TX paths in the driver. Some of it is specific to the if_em driver, > some involve improvements in the FFWD and PFIL_HOOKS code as well as the > driver. What will help me is if you can hook up a serial console to > your machine and see if it can be made to drop to the debugger while it > is under load and otherwise unresponsive. If you can, getting a process > dump might help confirm where each CPU is spending its time. > > Scott I applied Jack's patch to the em driver and all seemed well until xl was giving me the same issues. Thanks Jack on my machine your first patch looks 100% Since my box does not take too much load and to me a slightly more loaded machine is better than an unstable one i re-complied the kernel without SMP so I have a dual CPU system with only one of the CPU's working. I've smacked it with about 50G of data using samba and FTP and it didn't blink. I am however using a fxp card for the live IP side but the xl's are still in the kernel and getting picked up. I have just not configured them with IP's for traffic. I don't think this is the issue tho. I'd say there's something to do with the SMP code that is causing these issues. I have another box with SMP on it. Same kind of setup with a Tyan Tiger instead of a Thunder motherboard. 2 Fxp NICs in it. Most of the time it's stable but if i throw a lot of traffic at it it locks up too. Next time it does I will post the console message, but there is no warnings about watchdog timeouts far as I can remember. It's running 5.5-RELEASE-p8 with SMP enabled. -Clay
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?007d01c70560$356577b0$9603a8c0>