From owner-freebsd-stable@FreeBSD.ORG Sat Jul 12 03:42:09 2008 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1BBEF1065670 for ; Sat, 12 Jul 2008 03:42:09 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.freebsd.org (Postfix) with ESMTP id CF2BA8FC15 for ; Sat, 12 Jul 2008 03:42:08 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from Macintosh-4.local ([10.0.0.194]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id m6C3TI6E048483 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 11 Jul 2008 20:29:18 -0700 (PDT) (envelope-from sam@freebsd.org) Message-ID: <4878250E.8040205@freebsd.org> Date: Fri, 11 Jul 2008 20:29:18 -0700 From: Sam Leffler Organization: FreeBSD Project User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421) MIME-Version: 1.0 To: "Alexandre \"Sunny\" Kovalenko" References: <1210640542.1008.33.camel@RabbitsDen> <4828FDE5.8090409@freebsd.org> <1210696676.985.18.camel@RabbitsDen> <482DB4FE.3010300@freebsd.org> <1215824846.1074.23.camel@RabbitsDen> In-Reply-To: <1215824846.1074.23.camel@RabbitsDen> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DCC--Metrics: ebb.errno.com; whitelist Cc: stable@freebsd.org Subject: Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jul 2008 03:42:09 -0000 Alexandre "Sunny" Kovalenko wrote: > On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote: >> Alexandre "Sunny" Kovalenko wrote: >>> On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: >>>> Alexandre "Sunny" Kovalenko wrote: >>>>> I seem to be able to lock my machine by going into wpa_cli and asking it >>>>> to 'reassoc'. The reason for question mark after "hard" is that debug >>>>> information (caused by wlandebug and athdebug) is being printed on the >>>>> console. The only way to get machine's attention is to hold power button >>>>> for 8 seconds. >>>> So this is just livelock due to console debug msgs. >>> I am not sure, I have parsed this well enough, so I will try to clarify: >>> machine becomes unresponsive *without* any debugging turned on, to an >>> extent that pressing the power button twice *does not* generate ACPI >>> console message (something to the tune of "going into S5 already -- >>> gimme a break"). If I turn ath debugging on, I do see those messages, >>> and only them, scrolling on the console. >> Guess I misunderstood you. > > I have finally got enough time and equipment to investigate this > further. Here are some conclusions: > > -- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed > a livelock. > > -- all system does, is executing ath_intr (if_ath.c) in the tight loop > with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate > possibility that I have caused livelock with the debug messages, I have > put conditional panic() into ath_intr, as soon as sc->sc_stats.ast_mib > reaches 10,000. Without any kind of the debug messages, it will be > triggered within 40-60 seconds after starting of wpa_supplicant. > > -- I suspect that comment below, might not hold true on my equipment > if (status & HAL_INT_MIB) { > sc->sc_stats.ast_mib++; > /* > * Disable interrupts until we service the MIB > * interrupt; otherwise it will continue to fire. > */ > ath_hal_intrset(ah, 0); > /* > * Let the hal handle the event. We assume it will <============ > * clear whatever condition caused the interrupt. <============ > */ > ath_hal_mibevent(ah, &sc->sc_halstats); > ath_hal_intrset(ah, sc->sc_imask); > } > > My hardware is: > ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, > RF5413) > ath0: mem 0xedf00000-0xedf0ffff irq 17 at device 0.0 on > pci3 > ath0: [ITHREAD] > ath0: using obsoleted if_watchdog interface > ath0: Ethernet address: 00:16:cf:26:2f:3f > ath0: mac 10.3 phy 6.1 radio 10.2 > > My wpa_supplicant.conf is: > ctrl_interface=/var/run/wpa_supplicant > ctrl_interface_group=wheel > eapol_version=2 > > network={ > ssid="XXXXXXXXXXX" > scan_ssid=1 > priority=1 > proto=WPA > pairwise=TKIP > group=TKIP > key_mgmt=WPA-PSK > psk="xxxxxxxxxxxxxxxxxxxxxx" > } > > Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to > different values. Only 11G SSID is configured in wpa_supplicant.conf. In > the test setup, AP is with 10' (3m) from the laptop. > > AP is successfully used by handful of Windows clients (including this > same laptop) and iBook G4. > > Neither wpa_supplicant with '-d -d' nor wlandebug 0xFFFFFFFF show > anything but normal scan. > > athdebug 0xFFFFFFFF loops with ath_intr: status 0x1000 until I power > down my laptop. > > I would appreciate any suggestion on what I can investigate further -- > at this point I have comfortable console setup and should be able to > field requests for further information much better. > Are you running powerd? Sam