From owner-freebsd-stable@FreeBSD.ORG Sat Jul 12 13:01:19 2008 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCD50106567F; Sat, 12 Jul 2008 13:01:18 +0000 (UTC) (envelope-from alex.kovalenko@verizon.net) Received: from vms044pub.verizon.net (vms044pub.verizon.net [206.46.252.44]) by mx1.freebsd.org (Postfix) with ESMTP id BDFEC8FC1B; Sat, 12 Jul 2008 13:01:18 +0000 (UTC) (envelope-from alex.kovalenko@verizon.net) Received: from [10.0.3.231] ([70.111.175.14]) by vms044.mailsrvcs.net (Sun Java System Messaging Server 6.2-6.01 (built Apr 3 2006)) with ESMTPA id <0K3W0070K9HE1A93@vms044.mailsrvcs.net>; Sat, 12 Jul 2008 08:00:51 -0500 (CDT) Date: Sat, 12 Jul 2008 08:53:18 -0400 From: "Alexandre \"Sunny\" Kovalenko" In-reply-to: <4878250E.8040205@freebsd.org> To: Sam Leffler Message-id: <1215867198.1551.4.camel@RabbitsDen> MIME-version: 1.0 X-Mailer: Evolution 2.22.3.1 FreeBSD GNOME Team Port Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <1210640542.1008.33.camel@RabbitsDen> <4828FDE5.8090409@freebsd.org> <1210696676.985.18.camel@RabbitsDen> <482DB4FE.3010300@freebsd.org> <1215824846.1074.23.camel@RabbitsDen> <4878250E.8040205@freebsd.org> Cc: stable@freebsd.org Subject: Re: Hard(?) lock when reassociating ath with wpa_supplicant on RELENG_7 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Jul 2008 13:01:19 -0000 On Fri, 2008-07-11 at 20:29 -0700, Sam Leffler wrote: > Alexandre "Sunny" Kovalenko wrote: > > On Fri, 2008-05-16 at 12:23 -0400, Sam Leffler wrote: > >> Alexandre "Sunny" Kovalenko wrote: > >>> On Mon, 2008-05-12 at 19:33 -0700, Sam Leffler wrote: > >>>> Alexandre "Sunny" Kovalenko wrote: > >>>>> I seem to be able to lock my machine by going into wpa_cli and asking it > >>>>> to 'reassoc'. The reason for question mark after "hard" is that debug > >>>>> information (caused by wlandebug and athdebug) is being printed on the > >>>>> console. The only way to get machine's attention is to hold power button > >>>>> for 8 seconds. > >>>> So this is just livelock due to console debug msgs. > >>> I am not sure, I have parsed this well enough, so I will try to clarify: > >>> machine becomes unresponsive *without* any debugging turned on, to an > >>> extent that pressing the power button twice *does not* generate ACPI > >>> console message (something to the tune of "going into S5 already -- > >>> gimme a break"). If I turn ath debugging on, I do see those messages, > >>> and only them, scrolling on the console. > >> Guess I misunderstood you. > > > > I have finally got enough time and equipment to investigate this > > further. Here are some conclusions: > > > > -- at this point (RELENG_7 as of July 9th around 15:30 EST) it is indeed > > a livelock. > > > > -- all system does, is executing ath_intr (if_ath.c) in the tight loop > > with the same status -- 0x1000 AKA HAL_INT_MIB. In order to eliminate > > possibility that I have caused livelock with the debug messages, I have > > put conditional panic() into ath_intr, as soon as sc->sc_stats.ast_mib > > reaches 10,000. Without any kind of the debug messages, it will be > > triggered within 40-60 seconds after starting of wpa_supplicant. > > > > -- I suspect that comment below, might not hold true on my equipment > > if (status & HAL_INT_MIB) { > > sc->sc_stats.ast_mib++; > > /* > > * Disable interrupts until we service the MIB > > * interrupt; otherwise it will continue to fire. > > */ > > ath_hal_intrset(ah, 0); > > /* > > * Let the hal handle the event. We assume it will <============ > > * clear whatever condition caused the interrupt. <============ > > */ > > ath_hal_mibevent(ah, &sc->sc_halstats); > > ath_hal_intrset(ah, sc->sc_imask); > > } > > > > My hardware is: > > ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, > > RF5413) > > ath0: mem 0xedf00000-0xedf0ffff irq 17 at device 0.0 on > > pci3 > > ath0: [ITHREAD] > > ath0: using obsoleted if_watchdog interface > > ath0: Ethernet address: 00:16:cf:26:2f:3f > > ath0: mac 10.3 phy 6.1 radio 10.2 > > > > My wpa_supplicant.conf is: > > ctrl_interface=/var/run/wpa_supplicant > > ctrl_interface_group=wheel > > eapol_version=2 > > > > network={ > > ssid="XXXXXXXXXXX" > > scan_ssid=1 > > priority=1 > > proto=WPA > > pairwise=TKIP > > group=TKIP > > key_mgmt=WPA-PSK > > psk="xxxxxxxxxxxxxxxxxxxxxx" > > } > > > > Access point is Netgear WNDR3300-1B with 11N and 11G SSID set up to > > different values. Only 11G SSID is configured in wpa_supplicant.conf. In > > the test setup, AP is with 10' (3m) from the laptop. > > > > AP is successfully used by handful of Windows clients (including this > > same laptop) and iBook G4. > > > > Neither wpa_supplicant with '-d -d' nor wlandebug 0xFFFFFFFF show > > anything but normal scan. > > > > athdebug 0xFFFFFFFF loops with ath_intr: status 0x1000 until I power > > down my laptop. > > > > I would appreciate any suggestion on what I can investigate further -- > > at this point I have comfortable console setup and should be able to > > field requests for further information much better. > > > > Are you running powerd? I do. And I just tried disabling it, and I could not reproduce the problem any more. Is there any way to reconcile if_ath with powerd? Thank you, -- Alexandre "Sunny" Kovalenko (Олександр Коваленко)