Date: Wed, 21 Nov 2007 11:39:15 -0800 From: Sam Leffler <sam@errno.com> To: Don Lewis <truckman@freebsd.org> Cc: chrcoluk@gmail.com, pyunyh@gmail.com, oleg.lomaka@gmail.com, freebsd-stable@freebsd.org Subject: Re: any hope for nfe/msk? Message-ID: <47448963.9070500@errno.com> In-Reply-To: <200711211837.lALIb8gB065394@gw.catspoiler.org> References: <200711211837.lALIb8gB065394@gw.catspoiler.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Don Lewis wrote: > On 21 Nov, Chris wrote: > >> On 07/11/2007, Pyun YongHyeon <pyunyh@gmail.com> wrote: >> >>> On Wed, Nov 07, 2007 at 02:28:00PM +0200, Oleg Lomaka wrote: >>> > Hello, >>> > >>> > Pyun YongHyeon wrote: >>> > >On Thu, Nov 01, 2007 at 10:59:48AM +0200, Oleg Lomaka wrote: >>> > > > Hello, >>> > > > >>> > > > Pyun YongHyeon wrote: >>> > > > >On Tue, Oct 30, 2007 at 04:01:04PM +0200, Oleg Lomaka wrote: >>> > > > > >>> > > > >[...] >>> > > > > >>> > > > > > I had RxFIFO overrun again :( >>> > > > > > from dmest: >>> > > > > > msk0: Rx FIFO overrun! >>> > > > > >>> > > > >[...] >>> > > > > >>> > > > >Please try attached patch again. Sorry for the trouble. >>> > > > >After applying the patch show me verbosed dmesg output related with >>> > > > >msk(4)/PHY driver. >>> > > > > >>> > > > >Thanks for testing. >>> > > > > >>> > > > pcib1: <MPTable PCI-PCI bridge> irq 16 at device 28.0 on pci0 >>> > > > pcib1: domain 0 >>> > > > pcib1: secondary bus 2 >>> > > > pcib1: subordinate bus 2 >>> > > > pcib1: I/O decode 0x2000-0x2fff >>> > > > pcib1: memory decode 0xd0100000-0xd01fffff >>> > > > pcib1: no prefetched decode >>> > > > pci2: <PCI bus> on pcib1 >>> > > > pci2: domain=0, physical bus=2 >>> > > > found-> vendor=0x11ab, dev=0x4352, revid=0x14 >>> > > > domain=0, bus=2, slot=0, func=0 >>> > > > class=02-00-00, hdrtype=0x00, mfdev=0 >>> > > > cmdreg=0x0007, statreg=0x4010, cachelnsz=16 (dwords) >>> > > > lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) >>> > > > intpin=a, irq=11 >>> > > > powerspec 2 supports D0 D1 D2 D3 current D0 >>> > > > MSI supports 2 messages, 64 bit >>> > > > map[10]: type Memory, range 64, base 0xd0100000, size 14, enabled >>> > > > pcib1: requested memory range 0xd0100000-0xd0103fff: good >>> > > > map[18]: type I/O Port, range 32, base 0x2000, size 8, enabled >>> > > > pcib1: requested I/O range 0x2000-0x20ff: in range >>> > > > pcib1: slot 0 INTA routed to irq 16 >>> > > > mskc0: <Marvell Yukon 88E8038 Gigabit Ethernet> port 0x2000-0x20ff mem >>> > > > 0xd0100000-0xd0103fff irq 16 at device 0.0 on pci2 >>> > > > mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd0100000 >>> > > > mskc0: MSI count : 2 >>> > > > mskc0: RAM buffer size : 4KB >>> > > > mskc0: Port 0 : Rx Queue 2KB(0x00000000:0x000007ff) >>> > > > mskc0: Port 0 : Tx Queue 2KB(0x00000800:0x00000fff) >>> > > > msk0: <Marvell Technology Group Ltd. Yukon FE Id 0xb7 Rev 0x01> on mskc0 >>> > > > msk0: bpf attached >>> > > > msk0: Ethernet address: 00:1b:24:0e:bc:26 >>> > > > miibus0: <MII bus> on msk0 >>> > > > e1000phy0: <Marvell 88E3082 10/100 Fast Ethernet PHY> PHY 0 on miibus0 >>> > > > e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto >>> > > > ioapic0: routing intpin 16 (PCI IRQ 16) to vector 49 >>> > > > mskc0: [MPSAFE] >>> > > > mskc0: [FILTER] >>> > > > >>> > > >>> > >So far all looks good to me. If you encounter watchdog timeouts >>> > >or Rx FIFO overruns let me know. >>> > > >>> > > >>> > >>> > Got it again: >>> > msk0: Rx FIFO overrun! >>> > I believe this is happening under heavy CPU usage. Now i have firefox >>> > compiling and watched pictures on remote windows box using rdesktop. And >>> > after few minutes got network freeze. >>> >>> If it only happens under heavy system loads it's probably normal. If >>> system is too busy to serve other jobs the msk(4) may not recevie >>> more packets because its receive buffer was full. Probably msk(4) >>> should just count the overrun errors without printing the message >>> such that it would save more CPU cycles. >>> Btw, did you also see watchdog timeout errors? >>> >>> > But it looks i didn't get any packet lost :). Take a look at ping >>> > statistics... funny... >>> >>> I guess something is wrong here. Latency is unacceptable. However >>> I have no idea why ICMP echo reponse takes so long time. Are you >>> using any power saving mechanism(powerd, cpufreq etc)? >>> >>> > tdevil% ping 10.1.1.254 >>> > PING 10.1.1.254 (10.1.1.254): 56 data bytes >>> > 64 bytes from 10.1.1.254: icmp_seq=0 ttl=64 time=35926.404 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=1 ttl=64 time=34925.694 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=2 ttl=64 time=33924.729 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=3 ttl=64 time=32923.814 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=4 ttl=64 time=31922.833 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=5 ttl=64 time=30921.878 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=6 ttl=64 time=29920.923 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=7 ttl=64 time=28919.960 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=8 ttl=64 time=27919.009 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=9 ttl=64 time=26918.042 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=10 ttl=64 time=25917.078 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=11 ttl=64 time=24916.115 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=12 ttl=64 time=23915.144 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=13 ttl=64 time=22914.192 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=14 ttl=64 time=21913.214 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=15 ttl=64 time=20912.278 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=16 ttl=64 time=19911.330 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=17 ttl=64 time=18910.375 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=18 ttl=64 time=17909.419 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=19 ttl=64 time=16853.821 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=20 ttl=64 time=15854.710 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=21 ttl=64 time=14701.312 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=22 ttl=64 time=13701.003 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=23 ttl=64 time=12700.052 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=24 ttl=64 time=11699.098 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=25 ttl=64 time=10698.148 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=36 ttl=64 time=0.463 ms >>> > 64 bytes from 10.1.1.254: icmp_seq=37 ttl=64 time=0.379 ms >>> > >>> >>> -- >>> Regards, >>> Pyun YongHyeon >>> _______________________________________________ >>> freebsd-stable@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >>> >>> >> I started having problems on nfe driver now I was using on 6.2 stable >> and I had polling enabled, the entire system was lagging and even when >> idle. I have no upgraded the box in question to 7.0 beta 3 and >> keeping the nfe driver on. >> >> irq22: nfe0 ehci0 1652548 20 >> >> It hasnt had heavy load since the upgrade yet. >> >> ehci0: <EHCI (generic) USB 2.0 controller> >> >> I have no local access so cannot disable usb in the bios, if I do a >> new kernel disabling ehci in the kernel config will this stop the >> interrupt sharing and allow me to use nfe reasonably without polling >> as I think polling itself has been causing me problems (i use nfs). >> >> Is nfe still getting development as these are existing problems that >> are known but there has been no update to the below page for a while >> now so I am curious if its dead in the water now. >> >> http://www.f.csce.kyushu-u.ac.jp/~shigeaki/software/freebsd-nfe.html >> >> Chris >> > > I've also seen wierd problems on a machine that shares an interrupt > between nfe and ehci. I'm hoping that this recent commit to -CURRENT > fixes the problem. I'm planning on trying it on my 7.0-BETA machine in > the next day or so. > > scottl 2007-11-21 04:03:51 UTC > > FreeBSD src repository > > Modified files: > sys/amd64/amd64 intr_machdep.c > sys/i386/i386 intr_machdep.c > sys/ia64/ia64 interrupt.c > sys/powerpc/powerpc intr_machdep.c > sys/sparc64/sparc64 intr_machdep.c > Log: > Extend critical section coverage in the low-level interrupt handlers to > include the ithread scheduling step. Without this, a preemption might > occur in between the interrupt getting masked and the ithread getting > scheduled. Since the interrupt handler runs in the context of curthread, > the scheudler might see it as having a such a low priority on a busy system > that it doesn't get to run for a _long_ time, leaving the interrupt stranded > in a disabled state. The only way that the preemption can happen is by > a fast/filter handler triggering a schduling event earlier in the handler, > so this problem can only happen for cases where an interrupt is being > shared by both a fast/filter handler and an ithread handler. Unfortunately, > it seems to be common for this sharing to happen with network and USB > devices, for example. This fixes many of the mysterious TCP session > timeouts and NIC watchdogs that were being reported. Many thanks to Sam > Lefler for getting to the bottom of this problem. > > nfe+ohci was the combo I had that prompted me to fix this. Sam
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47448963.9070500>