Date: Sat, 29 Jul 2006 09:11:02 -0700 From: Sam Leffler <sam@errno.com> To: Ross Finlayson <finlayson@live555.com> Cc: freebsd-mobile@freebsd.org Subject: Re: Ongoing problems with the "ath" interface - is any relief in sight?? Message-ID: <44CB8896.30904@errno.com> In-Reply-To: <44CB8179.5050503@errno.com> References: <f06230900c0f0a2835a9f@[66.80.62.44]> <44CB8179.5050503@errno.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Sam Leffler wrote: > Ross Finlayson wrote: >> For several months now, the "ath" interface has been spazzing out at >> random times (in systems that are acting as wireless base stations). For >> example: >> >> Jul 28 21:44:47 ns kernel: ath0: stuck beacon; resetting (bmiss count 4) >> Jul 28 21:44:47 ns kernel: ath0: ath_reset: unable to reset hardware; >> hal status 3 >> Jul 28 21:45:08 ns kernel: ath0: device timeout >> Jul 28 21:45:08 ns kernel: ath0: stuck beacon; resetting (bmiss count 4) >> Jul 28 21:45:08 ns kernel: ath0: ath_reset: unable to reset hardware; >> hal status 3 >> [and then the interface stops working] >> >> >> %cat /etc/motd >> FreeBSD 6.1-STABLE (GENERIC) #6: Thu Jul 27 20:55:43 PDT 2006 >> >> The error isn't always the same, however. Often it is >> ath0: device timeout >> or >> ath0: discard frame w/o packet header >> or even >> arp: unknown hardware address format (0x4500) >> >> In each case, however, the "ath" interface stops working Immediately >> after the error report, so I don't believe that the latter two error >> reports are legitimate. I'm wondering it perhaps there's a memory smash >> somewhere that's corrupting some driver data structures (thereby causing >> bogus error reports in addition to stopping the interface from working)? >> >> The last time I asked about this, someone speculated that 'power save >> mode' was the culprit. Unfortunately, the system is running in a coffee >> shop that provides public WiFi, so it's not possible to stop clients >> from using power save mode. >> >> On my system, these errors are often happening several times a day. Has >> anyone else run into frequent problems like this, and is anyone looking >> into a solution? > > "stuck beacon" means the tx dma of the beacon frame failed to complete > in a full beacon interval. Diagnosing such a problem requires > understanding why dma failed to complete. This usually involves > checking the dma descriptor for clues and/or looking at other > h/w-related state. If you have a "memory smash" then you will see it in > the descriptor contents--but I doubt it. In my experience this problem > is usually caused by feeding bogus data to the dma engine that causes it > to lockup but the problem in general is very complicated and not > something I can diagnose remotely. BTW, the fact the subsequent reset failed with error 3 (HAL_EIO in ah.h) indicates you've got something more going on. But since you didn't provide any details on what you're doing it's hard to say if you've got a hardware problem. Presumably you've done basic things like swap out parts and/or try to reproduce the problem in a controlled environment. Sam
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44CB8896.30904>