Date: Sat, 26 Jul 2014 14:38:30 -0700 From: Adrian Chadd <adrian@freebsd.org> To: Harm Weites <harm@weites.com> Cc: "freebsd-mips@freebsd.org" <freebsd-mips@freebsd.org> Subject: Re: interrupt storm arge0, tplink 1043nd Message-ID: <CAJ-Vmo=B53Zogg92w7-jiecn7sX=rtmOQPC3h5sDCh5T3ogoVw@mail.gmail.com> In-Reply-To: <53D40F9E.6020409@weites.com> References: <53CEB6B1.9050301@weites.com> <CAJ-VmomG7ZfJMdnU8DM5qiodR-BtPbjCXtVp2jXo9K6aAKzuPg@mail.gmail.com> <53D40F9E.6020409@weites.com>
next in thread | previous in thread | raw e-mail | index | archive | help
So those interrupts are: ar71xxreg.h:#define AR71XX_DMA_INTR 0x198 ar71xxreg.h:#define AR71XX_DMA_INTR_STATUS 0x19C ar71xxreg.h:#define DMA_INTR_ALL ((1 << 8) - 1) ar71xxreg.h:#define DMA_INTR_RX_BUS_ERROR (1 << 7) ar71xxreg.h:#define DMA_INTR_RX_OVERFLOW (1 << 6) ar71xxreg.h:#define DMA_INTR_RX_PKT_RCVD (1 << 4) ar71xxreg.h:#define DMA_INTR_TX_BUS_ERROR (1 << 3) ar71xxreg.h:#define DMA_INTR_TX_UNDERRUN (1 << 1) ar71xxreg.h:#define DMA_INTR_TX_PKT_SENT (1 << 0) .. so interrupt bit 4 is packet received. So yeah, it going up is quite expected. but is it triggering the storm? I'm not sure. So the next thing is figuring out if this s causing the storm logic to fire or not. I'l go digging. Thanks! -a On 26 July 2014 13:29, Harm Weites <harm@weites.com> wrote: > Oops, ofcourse it didn't work... After passing the correct argument > (&sc->intr_status, instead of sc) I got answers. > > These are the results of three times sysctl, producing 4 lines per run > (presumably 2 lines arge0 and 2 lines for the dumb arge1). First run > took place after boot, second a while after that and third just after > the storm. > > interrupt 1 count 135 > interrupt 1 count 135 > interrupt 1 count 0 > interrupt 1 count 0 > interrupt 1 count 4738 > interrupt 1 count 4738 > interrupt 1 count 0 > interrupt 1 count 0 > interrupt 1 count 5041 > interrupt 1 count 5041 > interrupt 1 count 0 > interrupt 1 count 0 > > interrupt 4 count 108 > interrupt 4 count 108 > interrupt 4 count 0 > interrupt 4 count 0 > interrupt 4 count 15843 > interrupt 4 count 15844 > interrupt 4 count 0 > interrupt 4 count 0 > interrupt 4 count 35311 > interrupt 4 count 35311 > interrupt 4 count 0 > interrupt 4 count 0 > > interrupt 6 count 0 > interrupt 6 count 0 > interrupt 6 count 0 > interrupt 6 count 0 > interrupt 6 count 4 > interrupt 6 count 4 > interrupt 6 count 0 > interrupt 6 count 0 > interrupt 6 count 11 > interrupt 6 count 11 > interrupt 6 count 0 > interrupt 6 count 0 > > Interrupt 4 went up rather quick, so that likely is the bad guy. Right? > > Regards, > Harm > > op 22-07-14 21:26, Adrian Chadd schreef: >> Hi! >> >> So, ignore the ath0 stuff for now. int2 should be arge0, right? >> >> what's vmstat -ia say? >> >> Assuming it's actually arge0, we need to add some debugging counters >> to the interrupt path to count how many of each interrupt are >> occuring. The stuff i stuck behind ARGEDEBUG() is useful for debugging >> some silly bugs but not at the rate that you're getting interrupts. >> >> So I'd add something like this to the arge softc struct: >> >> uint32_t intr_status[32]; >> >> .. then in the interrupt routine, something like this: >> >> temp_status = status; >> for (i = 0; i < 32; i++) { >> if (temp_status & 1) { >> intr_status[i]++; >> } >> temp_status = temp_status >> 1; >> } >> >> That'll count the number of interrupts that are firing for each >> interrupt status bit. >> >> Then, you'll want to write a sysctl for it. Have a look at >> if_ath_sysctl.c for the SYSCTL_PROC() entries. Just write one that >> when called will just printf() the intr_status array: >> >> for (i = 0; i < 32; i++) { >> printf("interrupt %d count %u\n", i, intr_status[i]); >> } >> >> Just make sure you do a complete kernel recompile as changing the >> headers doesn't always force the source files to recompile. >> >> >> -a >> >> >> On 22 July 2014 12:08, Harm Weites <harm@weites.com> wrote: >>> Hi, >>> >>> My 1043nd is complaining about interrupt storms, presumably only when >>> wifi is beeing used. When this occurs, networking is gone. >>> >>> The exact message thats flooding me: >>> interrupt storm detected on "int2"; throttling interrupt source >>> >>> The device associated with int2 is arge0. >>> >>> Some possibly related logs, though these messages start at boot: >>> >>> # /sbin/dmesg | tail >>> ath0: stuck beacon; resetting (bmiss count 4) >>> ar5416StopDmaReceive: dma failed to stop in 10ms >>> AR_CR=0x00000024 >>> AR_DIAG_SW=0x42000020 >>> MBSSID Set bit 22 of AR_STA_ID 0xb8c16866 >>> ath0: stuck beacon; resetting (bmiss count 4) >>> ar5416StopDmaReceive: dma failed to stop in 10ms >>> AR_CR=0x00000024 >>> AR_DIAG_SW=0x42000020 >>> MBSSID Set bit 22 of AR_STA_ID 0xb8c16866 >>> >>> This unit is configured with (arge0) port0 bound to device vlan1, port4 >>> to vlan2 and ports 1,2,3 make up vlan3. There is wlan0, bound to ath0 >>> and a bridge device that connects wlan0 to vlan3. There is a dhcp server >>> running in vlan3 to answer to wifi clients, internet is routed through >>> vlan1. This initially works but after a little while the storm begins >>> and the wifi client is left to die. >>> >>> Adrian@ suggested to start with reading which interrupt(s) occur(s), but >>> that is perhaps a little to hard for me to code :) Looking at if_arge.c, >>> it seems there is some debug code already in place (ARGEDEBUG()) though >>> I'm not sure on how to use that. Reading from the AR71XX_DMA_INTR >>> register and mapping its content to AR71XX_DMA_INTR_STATUS would be >>> something I'd like to do with a separate program (instead of boldly >>> taking a deepdive in to if_arge.c and recompiling/flashing untill >>> something works). >>> >>> One of my other units is configured with just a vlan device per switch >>> port, no wifi and no bridge. A third unit is configured with a wlan0, >>> vlan1 (port0) and vlan2 (ports 1,2,3,4). Both not showing any issues in >>> the past months. The only difference would be this problem-unit has a >>> bridge. >>> >>> Any thoughts on how to approach or 'just' fix this? >>> >>> Regards, >>> Harm >>> _______________________________________________ >>> freebsd-mips@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-mips >>> To unsubscribe, send any mail to "freebsd-mips-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=B53Zogg92w7-jiecn7sX=rtmOQPC3h5sDCh5T3ogoVw>