Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Jul 2006 09:11:02 -0700
From:      Sam Leffler <sam@errno.com>
To:        Ross Finlayson <finlayson@live555.com>
Cc:        freebsd-mobile@freebsd.org
Subject:   Re: Ongoing problems with the "ath" interface - is any relief in sight??
Message-ID:  <44CB8896.30904@errno.com>
In-Reply-To: <44CB8179.5050503@errno.com>
References:  <f06230900c0f0a2835a9f@[66.80.62.44]> <44CB8179.5050503@errno.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Sam Leffler wrote:
> Ross Finlayson wrote:
>> For several months now, the "ath" interface has been spazzing out at
>> random times (in systems that are acting as wireless base stations). For
>> example:
>>
>> Jul 28 21:44:47 ns kernel: ath0: stuck beacon; resetting (bmiss count 4)
>> Jul 28 21:44:47 ns kernel: ath0: ath_reset: unable to reset hardware;
>> hal status 3
>> Jul 28 21:45:08 ns kernel: ath0: device timeout
>> Jul 28 21:45:08 ns kernel: ath0: stuck beacon; resetting (bmiss count 4)
>> Jul 28 21:45:08 ns kernel: ath0: ath_reset: unable to reset hardware;
>> hal status 3
>> [and then the interface stops working]
>>
>>
>> %cat /etc/motd
>> FreeBSD 6.1-STABLE (GENERIC) #6: Thu Jul 27 20:55:43 PDT 2006
>>
>> The error isn't always the same, however.  Often it is
>>     ath0: device timeout
>> or
>>     ath0: discard frame w/o packet header
>> or even
>>     arp: unknown hardware address format (0x4500)
>>
>> In each case, however, the "ath" interface stops working Immediately
>> after the error report, so I don't believe that the latter two error
>> reports are legitimate.  I'm wondering it perhaps there's a memory smash
>> somewhere that's corrupting some driver data structures (thereby causing
>> bogus error reports in addition to stopping the interface from working)?
>>
>> The last time I asked about this, someone speculated that 'power save
>> mode' was the culprit.  Unfortunately, the system is running in a coffee
>> shop that provides public WiFi, so it's not possible to stop clients
>> from using power save mode.
>>
>> On my system, these errors are often happening several times a day. Has
>> anyone else run into frequent problems like this, and is anyone looking
>> into a solution?
> 
> "stuck beacon" means the tx dma of the beacon frame failed to complete
> in a full beacon interval.  Diagnosing such a problem requires
> understanding why dma failed to complete.  This usually involves
> checking the dma descriptor for clues and/or looking at other
> h/w-related state.  If you have a "memory smash" then you will see it in
> the descriptor contents--but I doubt it.  In my experience this problem
> is usually caused by feeding bogus data to the dma engine that causes it
> to lockup but the problem in general is very complicated and not
> something I can diagnose remotely.

BTW, the fact the subsequent reset failed with error 3 (HAL_EIO in ah.h)
indicates you've got something more going on.  But since you didn't
provide any details on what you're doing it's hard to say if you've got
a hardware problem.  Presumably you've done basic things like swap out
parts and/or try to reproduce the problem in a controlled environment.

	Sam



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44CB8896.30904>