From owner-freebsd-mobile@FreeBSD.ORG Tue Feb 10 16:45:28 2009 Return-Path: Delivered-To: freebsd-mobile@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B6FDE1065677 for ; Tue, 10 Feb 2009 16:45:28 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.freebsd.org (Postfix) with ESMTP id 7425A8FC13 for ; Tue, 10 Feb 2009 16:45:28 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from trouble.errno.com (trouble.errno.com [10.0.0.248]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id n1AGUOKW027500 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 10 Feb 2009 08:30:24 -0800 (PST) (envelope-from sam@freebsd.org) Message-ID: <4991ABA0.4050300@freebsd.org> Date: Tue, 10 Feb 2009 08:30:24 -0800 From: Sam Leffler Organization: FreeBSD Project User-Agent: Thunderbird 2.0.0.18 (X11/20081209) MIME-Version: 1.0 To: Ian Smith References: <20090210154047.N38905@sola.nimnet.asn.au> In-Reply-To: <20090210154047.N38905@sola.nimnet.asn.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DCC-sonic.net-Metrics: ebb.errno.com; whitelist Cc: freebsd-mobile@freebsd.org Subject: Re: Problems with ath at kern.hz=100 X-BeenThere: freebsd-mobile@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Mobile computing with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Feb 2009 16:45:29 -0000 Ian Smith wrote: > On Mon, 9 Feb 2009, Bengt Ahlgren wrote: > > > I was changing kern.hz to 100 on my IBM X40 (Pentium-M) laptop to see > > whether that saves some battery, but ran into problems with ath not > > working properly. At first I thought that my hardware was failing, > > but changing hz back to 1000 solved the problem. I am running > > 7.1-RELEASE-p1 and have the following Atheros chip: > > > > Feb 9 11:13:10 P142 kernel: ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) > > Feb 9 11:13:10 P142 kernel: ath0: mem 0xd0200000-0xd020ffff irq 11 at device 2.0 on pci2 > > Feb 9 11:13:10 P142 kernel: ath0: [ITHREAD] > > Feb 9 11:13:10 P142 kernel: ath0: WARNING: using obsoleted if_watchdog interface > > Feb 9 11:13:10 P142 kernel: ath0: Ethernet address: 00:05:4e:4e:1f:c7 > > Feb 9 11:13:10 P142 kernel: ath0: mac 5.6 phy 4.1 5ghz radio 1.7 2ghz radio 2.3 > > There's an order of magnitude room to move between 100 and 1000 Hz. > You could also try with maybe 250 or 500? > > What speed is your Pentium-M? Are you using powerd? If so, what > minimum CPU freq does it drop back to (sysctl debug.cpufreq.lowest)? > > Sam has talked of a problem with ath (and maybe other) interrupt rates > overwhelming slow CPUs to the extent that userland (ie powerd) doesn't > get a look-in for a while, as I understand it. A fast CPU clocked back > to less than say 300MHz IS a slow CPU, until powerd gets to crank it up. > If powerd starvation were happening you'd see livelock not slow traffic. The problem was that MIB overflow interrupts from the card were pounding the host but since the cpu clock was lowered they were not serviced fast enough. FWIW I believe the MIB interrupt storm issue is fixed in head. I posted a patch to stable@ to enable backmerge of the necessary code but received ~zero feedback in >1 month so RELENG_7 will continue to rot w/ old code. > You might find that rc.conf setting economy_cx_lowest="LOW" can save you > some battery, though this benefit seems somewhat processor-dependent. > > HTH, Ian > > > > The symptom is that ath stops sending packets out. They seem to be > > buffered somewhere. After a while it gets going again, albeit some > > packets are lost. "A while" can be from 5 secs to a minute or so when > > continously running a ping. In the below logs, a ping is started when > > ath has stopped sending. > > > > > "netstat -I ath0 1" reports that no packets are sent, but the byte > > count is still incremented. When it gets going again, 18 packets are > > sent, but the byte count does not reflect that (which perpaps is > > logical): > > > > 1 0 56 0 0 0 0 > > 0 0 0 0 0 0 0 > > 0 0 0 0 0 98 0 > > 1 0 56 0 0 98 0 > > 1 0 56 0 0 98 0 > > input (ath0) output > > packets errs bytes packets errs bytes colls > > 0 0 0 0 0 98 0 > > 1 0 56 0 0 98 0 > > 1 0 56 0 0 98 0 > > 1 0 56 0 0 98 0 > > 0 0 0 0 0 98 0 > > 0 0 0 0 0 98 0 > > 0 0 0 0 0 98 0 > > 0 0 0 0 0 98 0 > > 0 0 0 0 0 98 0 > > 1 0 56 0 0 98 0 > > 1 0 56 0 0 98 0 > > 2 0 112 0 0 98 0 > > 0 0 0 0 0 98 0 > > 4 0 350 18 0 140 0 > > 2 0 112 0 0 0 0 > > 1 1 56 0 0 0 0 > > 1 0 56 0 0 0 0 > > 0 0 0 0 0 0 0 > > ^C > > > > Here is a tcpdump of the same event: > > > > 18:40:59.025364 arp who-has 10.1.1.72 tell 10.1.7.250 > > 18:41:00.049385 arp who-has 10.1.1.72 tell 10.1.7.250 > > 18:41:00.868608 arp who-has 10.1.6.253 tell 10.1.7.250 > > 18:41:01.073387 arp who-has 10.1.1.72 tell 10.1.7.250 > > 18:41:02.761445 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 0, length 64 > > 18:41:02.761456 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 1, length 64 > > 18:41:02.761461 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 2, length 64 > > 18:41:02.761466 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 3, length 64 > > 18:41:02.761470 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 4, length 64 > > 18:41:02.761474 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 5, length 64 > > 18:41:02.761479 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 6, length 64 > > 18:41:02.761483 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 7, length 64 > > 18:41:02.761487 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 8, length 64 > > 18:41:02.761492 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 9, length 64 > > 18:41:02.761497 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 10, length 64 > > 18:41:02.761502 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 11, length 64 > > 18:41:02.761506 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 12, length 64 > > 18:41:02.761510 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 13, length 64 > > 18:41:02.761515 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 14, length 64 > > 18:41:02.761519 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 15, length 64 > > 18:41:02.761523 IP 10.1.1.107 > 193.10.65.193: ICMP echo request, id 63239, seq 16, length 64 > > 18:41:02.916556 arp who-has 10.1.1.107 tell 10.1.7.250 > > 18:41:02.916579 arp reply 10.1.1.107 is-at 00:05:4e:4e:1f:c7 > > 18:41:02.917707 IP 193.10.65.193 > 10.1.1.107: ICMP echo reply, id 63239, seq 14, length 64 > > 18:41:02.917906 IP 193.10.65.193 > 10.1.1.107: ICMP echo reply, id 63239, seq 15, length 64 > > 18:41:02.918187 IP 193.10.65.193 > 10.1.1.107: ICMP echo reply, id 63239, seq 16, length 64 > > 18:41:03.531108 arp who-has 10.1.1.72 tell 10.1.7.250 > > 18:41:04.350185 arp who-has 10.1.6.166 tell 10.1.7.250 > > 18:41:04.554984 arp who-has 10.1.1.72 tell 10.1.7.250 > > 18:41:05.578972 arp who-has 10.1.1.72 tell 10.1.7.250 > > ^C > > 289 packets captured > > 289 packets received by filter > > 0 packets dropped by kernel > > > > Any idea on what the problem can be, or how do debug this further? > > You can use tcpdump to monitor traffic at multiple levels in the hierarchy: tcpdump -i ath0 -n # tap at 802.3 level tcpdump -i ath0 -n -y IEEE802_11 # tap at 802.11 level tcpdump -i ath0 -n -y IEEE802_11_RADIO # tap at driver level and/or you can check statistics using athstats and wlanstats or you can turn on debugging msgs in net80211 or the ath driver using wlandebug and athdebug. Some of these may require you to build your kernel w/ debug options (e.g. ATH_DEBUG). Sam