From owner-freebsd-bugs@FreeBSD.ORG Mon Jul 14 23:10:03 2008 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 96E9E1065675 for ; Mon, 14 Jul 2008 23:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7E89D8FC1B for ; Mon, 14 Jul 2008 23:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m6ENA32Y079042 for ; Mon, 14 Jul 2008 23:10:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m6ENA3eb079041; Mon, 14 Jul 2008 23:10:03 GMT (envelope-from gnats) Resent-Date: Mon, 14 Jul 2008 23:10:03 GMT Resent-Message-Id: <200807142310.m6ENA3eb079041@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Rory Arms Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A650E1065676 for ; Mon, 14 Jul 2008 23:02:43 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id A05608FC19 for ; Mon, 14 Jul 2008 23:02:43 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m6EN2hRd007130 for ; Mon, 14 Jul 2008 23:02:43 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.2/8.14.1/Submit) id m6EN2hiP007129; Mon, 14 Jul 2008 23:02:43 GMT (envelope-from nobody) Message-Id: <200807142302.m6EN2hiP007129@www.freebsd.org> Date: Mon, 14 Jul 2008 23:02:43 GMT From: Rory Arms To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: misc/125617: ath(4) related panic X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jul 2008 23:10:03 -0000 >Number: 125617 >Category: misc >Synopsis: ath(4) related panic >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Jul 14 23:10:02 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Rory Arms >Release: 7.0-RELEASE >Organization: >Environment: FreeBSD foo.domain.com 7.0-RELEASE FreeBSD 7.0-RELEASE #13: Sat Mar 8 19:01:13 EST 2008 root@foo.domain.com:/mnt/obj/usr/src/sys/TSERVER i386 >Description: I noticed that fxp1 was producing a lot of errors. At first I noticed it because the NFS clients were dropping a lot of packets, and there were big delays in pinging the servers from the clients as well. So, I looked at the console and saw several of these errors over and over. fxp1: SCB timeout: 0x80 0x0 0x50 0x400 In my case, I have ath0 bridged with fxp1, to form one network. So the above errors were mixed in with ath0: ath_reset: unable to reset hardware; hal status 03 This is the first time I've noticed this with this release, after over 60 days of uptime. I had been noticing that the wireless sometimes wasn't routing correctly through the NAT router (natd(8) + ipfw(4)), even though the fxp1 clients could, over that time, but it was an intermittent problem. I assume that issue was related to a bug in if_bridge(4), but that's just a guess. All I know is that issue started happening with 7.0. So, the next thing I decided to do is to bring down the bridge0 interface, and see if that would alleviate the issue (again, thinking the ethernet problems I was seeing were exacerbated by being linked in bridge0 or a problem with ath0. A few minutes after I downed the bridge0 interface, the kernel paniced. I have minidumps turned on so on the next boot it was able to scavange the dump. Here's the backtrace, as seen via kgdb(1): > sudo kgdb /boot/kernel/kernel vmcore.0 Password: [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-marcel-freebsd". Unread portion of the kernel message buffer: ath0: device timeout Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xc49a770c fault code = supervisor read, page not present instruction pointer = 0x20:0xc04b569a stack pointer = 0x28:0xe3ffebc4 frame pointer = 0x28:0xe3ffebf8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 14 (swi4: clock sio) trap number = 12 panic: page fault cpuid = 1 Uptime: 48d12h18m43s Physical memory: 1015 MB Dumping 197 MB: 182 166 150 134 118 102 86 70 54 38 22 6 #0 doadump () at pcpu.h:195 195 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xc059fbd6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc059feae in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xc08190cc in trap_fatal (frame=0xe3ffeb84, eva=3298457356) at /usr/src/sys/i386/i386/trap.c:899 #4 0xc081933b in trap_pfault (frame=0xe3ffeb84, usermode=0, eva=3298457356) at /usr/src/sys/i386/i386/trap.c:812 #5 0xc0819d32 in trap (frame=0xe3ffeb84) at /usr/src/sys/i386/i386/trap.c:490 #6 0xc080097b in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc04b569a in ath_rxbuf_init (sc=0xc3bdf000, bf=0xc3be9324) at /usr/src/sys/dev/ath/if_ath.c:3284 #8 0xc04b5919 in ath_startrecv (sc=0xc3bdf000) at /usr/src/sys/dev/ath/if_ath.c:4928 #9 0xc04bce7c in ath_reset (ifp=0xc3bc8800) at /usr/src/sys/dev/ath/if_ath.c:1145 #10 0xc04bd3bb in ath_watchdog (ifp=0xc3bc8800) at /usr/src/sys/dev/ath/if_ath.c:5774 #11 0xc0630871 in if_slowtimo (arg=0x0) at /usr/src/sys/net/if.c:1478 #12 0xc05b2136 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:274 #13 0xc058242b in ithread_loop (arg=0xc3b00230) at /usr/src/sys/kern/kern_intr.c:1036 #14 0xc057f154 in fork_exit (callout=0xc0582260 , arg=0xc3b00230, frame=0xe3ffed38) at /usr/src/sys/kern/kern_fork.c:781 ---Type to continue, or q to quit--- #15 0xc08009f0 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:205 (kgdb) print panicstr $1 = 0xc08f3e00 "page fault" While the server was fscking everything, I disconnected the cable and rerouted it, since it was tangled with a lot of other cables.. so thinking this issue could have been the result of some cross-talk, I rerouted it. I restarted the server and fxp1 has been working normally now for about 5 hours, with not a single new SCB timeout error in the logs, since the restart. As always here's the kernel configuration, sans the commented lines: cpu I686_CPU ident TSERVER-70 makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options SCHED_4BSD # 4BSD scheduler options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options INET6 # IPv6 communications protocols options SCTP # Stream Control Transmission Protocol options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options NFSCLIENT # Network Filesystem Client options NFSSERVER # Network Filesystem Server options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_LABEL # Provides labelization options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!] options COMPAT_FREEBSD6 # Compatible with FreeBSD6 options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options ADAPTIVE_GIANT # Giant mutex is adaptive. options STOP_NMI # Stop CPUS using NMI instead of IPI options AUDIT # Security event auditing options SMP # Symmetric MultiProcessor Kernel device apic # I/O APIC device cpufreq options IPDIVERT # divert(4) options IPFIREWALL #firewall options IPFIREWALL_VERBOSE #enable logging to syslogd(8) options IPFIREWALL_DEFAULT_TO_ACCEPT #allow everything by default options DUMMYNET # dummynet(4) device pci device fdc device ata device atadisk # ATA disk drives device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives options ATA_STATIC_ID # Static device numbering device ahc # AHA2940 and onboard AIC7xxx devices options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. # output. Adds ~215k to driver. device scbus # SCSI bus (required for SCSI) device ch # SCSI media changers device da # Direct Access (disks) device sa # Sequential Access (tape etc) device cd # CD device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver device splash # Splash screen and screen saver support device sc device agp # support several AGP chipsets device pmtimer device cbb # cardbus (yenta) bridge device pccard # PC Card (16-bit) bus device cardbus # CardBus (32-bit) bus device sio # 8250, 16[45]50 based serial ports device uart # Generic UART driver device ppc device ppbus # Parallel port bus (required) device lpt # Printer device plip # TCP/IP over parallel device ppi # Parallel port interface device device miibus # MII bus support device fxp # Intel EtherExpress PRO/100B (82557, 82558) device wlan # 802.11 support device wlan_wep # 802.11 WEP support device wlan_ccmp # 802.11 CCMP support device wlan_tkip # 802.11 TKIP support device wlan_amrr # AMRR transmit rate control algorithm device wlan_scan_ap # 802.11 AP mode scanning device wlan_scan_sta # 802.11 STA mode scanning device ath # Atheros pci/cardbus NIC's device ath_hal # Atheros HAL (Hardware Access Layer) device ath_rate_sample # SampleRate tx rate control for ath device loop # Network loopback device random # Entropy device device ether # Ethernet support device sl # Kernel SLIP device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device gif # IPv6 and IPv4 tunneling device faith # IPv6-to-IPv4 relaying (translation) device firmware # firmware assist module device bpf # Berkeley packet filter device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device usb # USB Bus (required) device ugen # Generic device uhid # "Human Interface Devices" device ukbd # Keyboard device ulpt # Printer device umass # Disks/Mass storage - Requires scbus and da device ums # Mouse device ural # Ralink Technology RT2500USB wireless NICs device rum # Ralink Technology RT2501USB wireless NICs >How-To-Repeat: Unsure, unless it's something that can always be reproduced by downing the bridge0 interface, which has two members, fxp1 and ath0. Looking at the traceback the panic seemed to have been caused by ath(4), so I'm not sure that the bridge is at fault here, but maybe some kind of unhandled scenario by ath(4). >Fix: >Release-Note: >Audit-Trail: >Unformatted: