From owner-freebsd-current@FreeBSD.ORG Sat Aug 11 05:42:26 2007 Return-Path: Delivered-To: current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F0B416A419 for ; Sat, 11 Aug 2007 05:42:26 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (adsl-75-1-14-242.dsl.scrm01.sbcglobal.net [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 4FF1613C480 for ; Sat, 11 Aug 2007 05:42:26 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id l7B5gJm9058171 for ; Fri, 10 Aug 2007 22:42:23 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <200708110542.l7B5gJm9058171@gw.catspoiler.org> Date: Fri, 10 Aug 2007 22:42:19 -0700 (PDT) From: Don Lewis To: current@FreeBSD.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: Subject: bizarre nfe(4) problem X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Aug 2007 05:42:26 -0000 I've a rather strange nfe(4) problem that appears to be repeatable. I recently started running -CURRENT on a older socket 754 motherboard with the nForce3 chipset. Initially, I was running an SMP kernel, but I had problems with sporadic "nfe0: watchdog timeout (missed Tx interrupts) -- recovering" problems that would intermittently cause the system to lose network connectivity which it would recover from. The kernel was very similar to GENERIC, with just the addition of "options DEBUG_VFS_LOCKS" and the replacement of atapicd with atapicam. The nfe0 problem totally went away when I removed "options SMP" and "device apic" from the kernel configuration, except under the following very specific circumstances: A vncserver session using the GNOME desktop was started on the system. There was no keyboard or mouse activity on the console for an extended period of time, allowing the GNOME screen saver to kick in and lock the screen. The system would run fine in this state for many hours, and would accept incoming SMTP connections, etc. A remote vncclient makes a connection to the vncserver session and the password was entired on the client. At this point the nfe0 interface would appear to go deaf. This might happen before or slightly after the password dialog box appeared for the vnc session. For a short while, the system would be able to transmit TCP packets, ntp queries, etc., but it would not respond to any incoming packets (ping, TCP connection requests, etc.). Eventually, the ARP cache would time out and the only packets being transmitted would be ARP requests and the occasional UDP broadcast from the samba server running on the machine. Pressing any key on the (PS/2) keyboard would instantly bring the network interface back to life. Examination of /var/log/messages showed lots of "nfe0: watchdog timeout" messages for the entire time that nfe0 was not listening to the network. I've had this problem happen twice. Both times were after an extended period of console inactivity. An incoming vnc connection is not sufficient to trigger the problem if the console was recently active, and even waiting for the GNOME screensaver to put the monitor in DPMS power save mode before initiating the vnc connection does not appear to be sufficient to trigger the problem. I believe that nfe0 was sharing an interrupt with one of the USB ports when the kernel was compiled with "device apic", but it is not sharing an interrupt without "device apic". Any thoughts on how to debug this problem? # vmstat -i interrupt total rate irq0: clk 41903449 1000 irq1: atkbd0 39034 0 irq3: ohci0 5 0 irq7: ppc0 2 0 irq8: rtc 5362802 127 irq9: ohci1 ahc0+ 1963559 46 irq10: nfe0+ 225593 5 irq11: drm0 2511908 59 irq12: psm0 332931 7 irq14: ata0 48 0 Total 52339331 1249 Here's the dmesg info: Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-CURRENT #18: Thu Aug 9 17:35:15 PDT 2007 dl@mousie.catspoiler.org:/usr/obj/usr/src/sys/GENERICDDB WARNING: WITNESS option enabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 Processor 3000+ (2009.79-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x20fc2 Stepping = 2 Features=0x78bfbff Features2=0x1 AMD Features=0xe2500800 AMD Features2=0x1 real memory = 1073479680 (1023 MB) avail memory = 1037099008 (989 MB) kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 3ff00000 (3) failed Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 cpu0: on acpi0 powernow0: on cpu0 device_attach: powernow0 attach returned 6 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: on hostb0 isab0: at device 1.0 on pci0 isa0: on isab0 pci0: at device 1.1 (no driver attached) ohci0: mem 0xfebfd000-0xfebfdfff irq 3 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: on ohci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 4 ports with 4 removable, self powered ohci1: mem 0xfebfe000-0xfebfefff irq 9 at device 2.1 on pci0 ohci1: [GIANT-LOCKED] ohci1: [ITHREAD] usb1: OHCI version 1.0, legacy support usb1: on ohci1 usb1: USB revision 1.0 uhub1: on usb1 uhub1: 4 ports with 4 removable, self powered pci0: at device 2.2 (no driver attached) nfe0: port 0xec00-0xec07 mem 0xfebfc000-0xfebfcfff irq 10 at device 5.0 on pci0 miibus0: on nfe0 e1000phy0: PHY 1 on miibus0 e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto nfe0: Ethernet address: 00:15:f2:6a:bf:a6 nfe0: [FILTER] pci0: at device 6.0 (no driver attached) atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 8.0 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] atapci1: port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xc800-0xc80f,0xc400-0xc47f irq 10 at device 10.0 on pci0 atapci1: [ITHREAD] ata2: on atapci1 ata2: [ITHREAD] ata3: on atapci1 ata3: [ITHREAD] pcib1: at device 11.0 on pci0 pci1: on pcib1 vgapci0: mem 0xea000000-0xebffffff,0xfe9fc000-0xfe9fffff,0xfe000000-0xfe7fffff irq 11 at device 0.0 on pci1 pcib2: at device 14.0 on pci0 pci2: on pcib2 ahc0: port 0xb800-0xb8ff mem 0xfeaff000-0xfeafffff irq 9 at device 10.0 on pci2 ahc0: [ITHREAD] aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs acpi_button0: on acpi0 fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model IntelliMouse Explorer, device ID 4 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc8fff pnpid ORM0000 on isa0 ppc0: at port 0x378-0x37f irq 7 on isa0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ulpt0: on uhub0 ulpt0: using bi-directional mode Timecounter "TSC" frequency 2009791960 Hz quality 800 Timecounters tick every 1.000 msec Waiting 5 seconds for SCSI devices to settle unknown: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 unknown: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 sa0 at ahc0 bus 0 target 4 lun 0 sa0: Removable Sequential Access SCSI-2 device sa0: 3.300MB/s transfers sa1 at ahc0 bus 0 target 6 lun 0 sa1: Removable Sequential Access SCSI-2 device sa1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit) cd0 at ata0 bus 0 target 0 lun 0 cd0: Removable CD-ROM SCSI-0 device cd0: 3.300MB/s transfers cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed ch0 at ahc0 bus 0 target 6 lun 1 ch0: Removable Changer SCSI-2 device ch0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit) ch0: 8 slots, 1 drive, 1 picker, 0 portals da0 at ahc0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit) da0: Command Queueing Enabled da0: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C) WARNING: WITNESS option enabled, expect reduced performance. Trying to mount root from ufs:/dev/da0s1a nfe0: link state changed to UP drm0: on vgapci0 info: [drm] AGP at 0xf0000000 128MB info: [drm] Initialized mga 3.2.2 20060319 info: [drm] Initialized card for AGP DMA. drm0: [ITHREAD]