From owner-freebsd-mips@FreeBSD.ORG Sat Jul 26 21:38:31 2014 Return-Path: Delivered-To: freebsd-mips@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C29C23EE for ; Sat, 26 Jul 2014 21:38:31 +0000 (UTC) Received: from mail-qa0-x22f.google.com (mail-qa0-x22f.google.com [IPv6:2607:f8b0:400d:c00::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8449823FF for ; Sat, 26 Jul 2014 21:38:31 +0000 (UTC) Received: by mail-qa0-f47.google.com with SMTP id i13so6098481qae.6 for ; Sat, 26 Jul 2014 14:38:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=Sk4KaRxgR2F9/JYYeApByki35qA+emdgzMTrXQOp4sQ=; b=0qf6inbRyz+WTUJRlWkFegyB6KQUbYiMCwkoKY92rFZDTY4kKMgJB1bk4uLcEW+vZu vvQMMtnFl9jUjJJdvCsKlwv1Cjjv6Aga+r/4fU3PIqTtlELUUJ0zvK1s4y+tH+u0YskZ izftT2Nd78uKp7uYkUhoQ9mJvgLIA5GmHZPYSsYr+URZW05kRun8+RpG5qcGu80wuzTS qbigJYqOxg5PmZZ2kYbQZtV0okHbfHcZOW07FjgnSeQNUV0HGWwwocVIYY4ZGjLR3aSV 2s8KxHhHjgdhdBGCfSnvnVRt0DxAWPj1zIkJArmpnCUO7QYPY+nuEs3QSRqq0awUtcFM Rv4g== MIME-Version: 1.0 X-Received: by 10.224.97.65 with SMTP id k1mr42238316qan.28.1406410710597; Sat, 26 Jul 2014 14:38:30 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.1.6 with HTTP; Sat, 26 Jul 2014 14:38:30 -0700 (PDT) In-Reply-To: <53D40F9E.6020409@weites.com> References: <53CEB6B1.9050301@weites.com> <53D40F9E.6020409@weites.com> Date: Sat, 26 Jul 2014 14:38:30 -0700 X-Google-Sender-Auth: Fk3Tggq4wRyEFEil25GM2RE0hDo Message-ID: Subject: Re: interrupt storm arge0, tplink 1043nd From: Adrian Chadd To: Harm Weites Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-mips@freebsd.org" X-BeenThere: freebsd-mips@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Porting FreeBSD to MIPS List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Jul 2014 21:38:31 -0000 So those interrupts are: ar71xxreg.h:#define AR71XX_DMA_INTR 0x198 ar71xxreg.h:#define AR71XX_DMA_INTR_STATUS 0x19C ar71xxreg.h:#define DMA_INTR_ALL ((1 << 8) - 1) ar71xxreg.h:#define DMA_INTR_RX_BUS_ERROR (1 << 7) ar71xxreg.h:#define DMA_INTR_RX_OVERFLOW (1 << 6) ar71xxreg.h:#define DMA_INTR_RX_PKT_RCVD (1 << 4) ar71xxreg.h:#define DMA_INTR_TX_BUS_ERROR (1 << 3) ar71xxreg.h:#define DMA_INTR_TX_UNDERRUN (1 << 1) ar71xxreg.h:#define DMA_INTR_TX_PKT_SENT (1 << 0) .. so interrupt bit 4 is packet received. So yeah, it going up is quite expected. but is it triggering the storm? I'm not sure. So the next thing is figuring out if this s causing the storm logic to fire or not. I'l go digging. Thanks! -a On 26 July 2014 13:29, Harm Weites wrote: > Oops, ofcourse it didn't work... After passing the correct argument > (&sc->intr_status, instead of sc) I got answers. > > These are the results of three times sysctl, producing 4 lines per run > (presumably 2 lines arge0 and 2 lines for the dumb arge1). First run > took place after boot, second a while after that and third just after > the storm. > > interrupt 1 count 135 > interrupt 1 count 135 > interrupt 1 count 0 > interrupt 1 count 0 > interrupt 1 count 4738 > interrupt 1 count 4738 > interrupt 1 count 0 > interrupt 1 count 0 > interrupt 1 count 5041 > interrupt 1 count 5041 > interrupt 1 count 0 > interrupt 1 count 0 > > interrupt 4 count 108 > interrupt 4 count 108 > interrupt 4 count 0 > interrupt 4 count 0 > interrupt 4 count 15843 > interrupt 4 count 15844 > interrupt 4 count 0 > interrupt 4 count 0 > interrupt 4 count 35311 > interrupt 4 count 35311 > interrupt 4 count 0 > interrupt 4 count 0 > > interrupt 6 count 0 > interrupt 6 count 0 > interrupt 6 count 0 > interrupt 6 count 0 > interrupt 6 count 4 > interrupt 6 count 4 > interrupt 6 count 0 > interrupt 6 count 0 > interrupt 6 count 11 > interrupt 6 count 11 > interrupt 6 count 0 > interrupt 6 count 0 > > Interrupt 4 went up rather quick, so that likely is the bad guy. Right? > > Regards, > Harm > > op 22-07-14 21:26, Adrian Chadd schreef: >> Hi! >> >> So, ignore the ath0 stuff for now. int2 should be arge0, right? >> >> what's vmstat -ia say? >> >> Assuming it's actually arge0, we need to add some debugging counters >> to the interrupt path to count how many of each interrupt are >> occuring. The stuff i stuck behind ARGEDEBUG() is useful for debugging >> some silly bugs but not at the rate that you're getting interrupts. >> >> So I'd add something like this to the arge softc struct: >> >> uint32_t intr_status[32]; >> >> .. then in the interrupt routine, something like this: >> >> temp_status = status; >> for (i = 0; i < 32; i++) { >> if (temp_status & 1) { >> intr_status[i]++; >> } >> temp_status = temp_status >> 1; >> } >> >> That'll count the number of interrupts that are firing for each >> interrupt status bit. >> >> Then, you'll want to write a sysctl for it. Have a look at >> if_ath_sysctl.c for the SYSCTL_PROC() entries. Just write one that >> when called will just printf() the intr_status array: >> >> for (i = 0; i < 32; i++) { >> printf("interrupt %d count %u\n", i, intr_status[i]); >> } >> >> Just make sure you do a complete kernel recompile as changing the >> headers doesn't always force the source files to recompile. >> >> >> -a >> >> >> On 22 July 2014 12:08, Harm Weites wrote: >>> Hi, >>> >>> My 1043nd is complaining about interrupt storms, presumably only when >>> wifi is beeing used. When this occurs, networking is gone. >>> >>> The exact message thats flooding me: >>> interrupt storm detected on "int2"; throttling interrupt source >>> >>> The device associated with int2 is arge0. >>> >>> Some possibly related logs, though these messages start at boot: >>> >>> # /sbin/dmesg | tail >>> ath0: stuck beacon; resetting (bmiss count 4) >>> ar5416StopDmaReceive: dma failed to stop in 10ms >>> AR_CR=0x00000024 >>> AR_DIAG_SW=0x42000020 >>> MBSSID Set bit 22 of AR_STA_ID 0xb8c16866 >>> ath0: stuck beacon; resetting (bmiss count 4) >>> ar5416StopDmaReceive: dma failed to stop in 10ms >>> AR_CR=0x00000024 >>> AR_DIAG_SW=0x42000020 >>> MBSSID Set bit 22 of AR_STA_ID 0xb8c16866 >>> >>> This unit is configured with (arge0) port0 bound to device vlan1, port4 >>> to vlan2 and ports 1,2,3 make up vlan3. There is wlan0, bound to ath0 >>> and a bridge device that connects wlan0 to vlan3. There is a dhcp server >>> running in vlan3 to answer to wifi clients, internet is routed through >>> vlan1. This initially works but after a little while the storm begins >>> and the wifi client is left to die. >>> >>> Adrian@ suggested to start with reading which interrupt(s) occur(s), but >>> that is perhaps a little to hard for me to code :) Looking at if_arge.c, >>> it seems there is some debug code already in place (ARGEDEBUG()) though >>> I'm not sure on how to use that. Reading from the AR71XX_DMA_INTR >>> register and mapping its content to AR71XX_DMA_INTR_STATUS would be >>> something I'd like to do with a separate program (instead of boldly >>> taking a deepdive in to if_arge.c and recompiling/flashing untill >>> something works). >>> >>> One of my other units is configured with just a vlan device per switch >>> port, no wifi and no bridge. A third unit is configured with a wlan0, >>> vlan1 (port0) and vlan2 (ports 1,2,3,4). Both not showing any issues in >>> the past months. The only difference would be this problem-unit has a >>> bridge. >>> >>> Any thoughts on how to approach or 'just' fix this? >>> >>> Regards, >>> Harm >>> _______________________________________________ >>> freebsd-mips@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-mips >>> To unsubscribe, send any mail to "freebsd-mips-unsubscribe@freebsd.org" >