Date: Fri, 27 Apr 2007 09:25:18 -0400 From: Sven Willenberger <sven@dmv.com> To: Jack Vogel <jfvogel@gmail.com> Cc: freebsd-stable@freebsd.org Subject: Re: CARP and em0 timeout watchdog Message-ID: <1177680318.8713.1.camel@lanshark.dmv.com> In-Reply-To: <1177094694.5457.31.camel@lanshark.dmv.com> References: <1176911436.7416.8.camel@lanshark.dmv.com> <1177084316.5457.5.camel@lanshark.dmv.com> <20070420160431.GA17356@icarus.home.lan> <2a41acea0704201017n42d4e987l77752ee8f7ca9f1f@mail.gmail.com> <1177091905.5457.17.camel@lanshark.dmv.com> <2a41acea0704201127x319be08cw869efe1dd02a046e@mail.gmail.com> <1177094694.5457.31.camel@lanshark.dmv.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2007-04-20 at 14:44 -0400, Sven Willenberger wrote: > On Fri, 2007-04-20 at 11:27 -0700, Jack Vogel wrote: > > On 4/20/07, Sven Willenberger <sven@dmv.com> wrote: > > > On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote: > > > > On 4/20/07, Jeremy Chadwick <koitsu@freebsd.org> wrote: > > > > > On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote: > > > > > > Having done more diagnostics I have found out it is not CARP related at > > > > > > all. It turns out that the same timeouts will happen when ftp'ing to the > > > > > > physical address IPs as well. There is also an odd situation here > > > > > > depending on which protocol I use. The two boxes are connected to a Dell > > > > > > Powerconnect 2616 gig switch with CAT6. If I scp files from the > > > > > > 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a > > > > > > hiccup (I used dd to create various sized testfiles from 32M to 1G in > > > > > > size and just scp testfile* to the other box). On the other hand, if I > > > > > > connect to 192.168.0.19 using ftp (either active or passive) where ftp > > > > > > is being run through inetd, the interface resets (watchdog) within > > > > > > seconds (a few MBs) of traffic. Enabling polling does nothing, nor does > > > > > > changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing > > > > > > such behavioral differences between scp and ftp? > > > > > > > > > > You'll get a much higher throughput rate with FTP than you will with > > > > > SSH, simply because encryption overhead is quite high (even with the > > > > > Blowfish cipher). With a very fast processor and on a gigE network > > > > > you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP. > > > > > That's the only difference I can think of. > > > > > > > > > > The watchdog resets I can't explain; Jack Vogel should be able to assist > > > > > with that. But it sounds like the resets only happen under very high > > > > > throughput conditions (which is why you'd see it with FTP but not SSH). > > > > > > > > What kind of hardware is this interface? Watchdogs mean TX cleanup > > > > isn't happening in a reasonable time, without further data its hard to > > > > know what might be going on. > > > > > > > > Jack > > > > > > from pciconf: > > > > > > em0@pci13:0:0: class=0x020000 card=0x108c15d9 chip=0x108c8086 rev=0x03 > > > hdr=0x00 > > > vendor = 'Intel Corporation' > > > device = 'PRO/1000 PM' > > > class = network > > > subclass = ethernet > > > em1@pci14:0:0: class=0x020000 card=0x109a15d9 chip=0x109a8086 rev=0x00 > > > hdr=0x00 > > > vendor = 'Intel Corporation' > > > class = network > > > subclass = ethernet > > > > > > em0 is the interface in question. > > > > > > from dmesg: > > > > > > em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port > > > 0x4000-0x401f mem 0xe0300000-0xe031ffff irq 16 at device 0.0 on pci13 > > > > > > em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port > > > 0x5000-0x501f mem 0xe0400000-0xe041ffff irq 17 at device 0.0 on pci14 > > > > OH, this is an 82573, and I've posted a firmware patcher a couple > > different times, there is a bit in the MANC register that is incorrectly > > programmed in some vendors systems. Can you search email for > > that patcher, it needs to run from DOS. If you are unable to find > > it let me know and I'll resent you a copy. > > > > Jack > > If you are referring to the dcgdis.ThisIsZip attachment, I found it in > earlier threads, thanks. Will work on patching the nics and will keep > the list updated. > > Thanks again. > > Sven > I am happy to report that the firmware patch seems to have fixed the issue and I can transfer data across the gigE network without the watchdog timeouts and lockups. Thanks again!! Sven
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1177680318.8713.1.camel>