Date: Sat, 26 Jul 2014 10:00:05 +0200 From: Harm Weites <harm@weites.com> To: Adrian Chadd <adrian@freebsd.org> Cc: "freebsd-mips@freebsd.org" <freebsd-mips@freebsd.org> Subject: Re: interrupt storm arge0, tplink 1043nd Message-ID: <53D36005.1010200@weites.com> In-Reply-To: <CAJ-VmomG7ZfJMdnU8DM5qiodR-BtPbjCXtVp2jXo9K6aAKzuPg@mail.gmail.com> References: <53CEB6B1.9050301@weites.com> <CAJ-VmomG7ZfJMdnU8DM5qiodR-BtPbjCXtVp2jXo9K6aAKzuPg@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
Hi Adrian,
Thanks for your pointers, I've attached a patch to enable counting the
interrupts. Also attached is a log with the interrupt counters in there
pristine state, just after boot and another from right after the storm.
Though It looks like none of the counters got incremented, so I most
probably wrote bad code.
Another thing I noticed, every interrupt (/group of 32) got printed 4
times. Printf also only sent the result to my serial console, even when
I ran /sbin/sysctl using ssh so that still deserves a little attention.
About vmstat, yes, it regards arge0:
# vmstat -ia
interrupt total rate
sint0: 0 0
sint1: 0 0
int0 ath0 2938765 80
int1: 0 0
int2 arge0 335493 9
int3 arge1 0 0
int4 apb0 1936 0
int5 clock0 1129353 30
apb intr3: uart0 1936 0
apb intr2: gpio0 0 0
apb irq5: pmc 0 0
Total 4407483 120
Regards,
Harm
op 22-07-14 21:26, Adrian Chadd schreef:
> Hi!
>
> So, ignore the ath0 stuff for now. int2 should be arge0, right?
>
> what's vmstat -ia say?
>
> Assuming it's actually arge0, we need to add some debugging counters
> to the interrupt path to count how many of each interrupt are
> occuring. The stuff i stuck behind ARGEDEBUG() is useful for debugging
> some silly bugs but not at the rate that you're getting interrupts.
>
> So I'd add something like this to the arge softc struct:
>
> uint32_t intr_status[32];
>
> .. then in the interrupt routine, something like this:
>
> temp_status = status;
> for (i = 0; i < 32; i++) {
> if (temp_status & 1) {
> intr_status[i]++;
> }
> temp_status = temp_status >> 1;
> }
>
> That'll count the number of interrupts that are firing for each
> interrupt status bit.
>
> Then, you'll want to write a sysctl for it. Have a look at
> if_ath_sysctl.c for the SYSCTL_PROC() entries. Just write one that
> when called will just printf() the intr_status array:
>
> for (i = 0; i < 32; i++) {
> printf("interrupt %d count %u\n", i, intr_status[i]);
> }
>
> Just make sure you do a complete kernel recompile as changing the
> headers doesn't always force the source files to recompile.
>
>
> -a
>
>
> On 22 July 2014 12:08, Harm Weites <harm@weites.com> wrote:
>> Hi,
>>
>> My 1043nd is complaining about interrupt storms, presumably only when
>> wifi is beeing used. When this occurs, networking is gone.
>>
>> The exact message thats flooding me:
>> interrupt storm detected on "int2"; throttling interrupt source
>>
>> The device associated with int2 is arge0.
>>
>> Some possibly related logs, though these messages start at boot:
>>
>> # /sbin/dmesg | tail
>> ath0: stuck beacon; resetting (bmiss count 4)
>> ar5416StopDmaReceive: dma failed to stop in 10ms
>> AR_CR=0x00000024
>> AR_DIAG_SW=0x42000020
>> MBSSID Set bit 22 of AR_STA_ID 0xb8c16866
>> ath0: stuck beacon; resetting (bmiss count 4)
>> ar5416StopDmaReceive: dma failed to stop in 10ms
>> AR_CR=0x00000024
>> AR_DIAG_SW=0x42000020
>> MBSSID Set bit 22 of AR_STA_ID 0xb8c16866
>>
>> This unit is configured with (arge0) port0 bound to device vlan1, port4
>> to vlan2 and ports 1,2,3 make up vlan3. There is wlan0, bound to ath0
>> and a bridge device that connects wlan0 to vlan3. There is a dhcp server
>> running in vlan3 to answer to wifi clients, internet is routed through
>> vlan1. This initially works but after a little while the storm begins
>> and the wifi client is left to die.
>>
>> Adrian@ suggested to start with reading which interrupt(s) occur(s), but
>> that is perhaps a little to hard for me to code :) Looking at if_arge.c,
>> it seems there is some debug code already in place (ARGEDEBUG()) though
>> I'm not sure on how to use that. Reading from the AR71XX_DMA_INTR
>> register and mapping its content to AR71XX_DMA_INTR_STATUS would be
>> something I'd like to do with a separate program (instead of boldly
>> taking a deepdive in to if_arge.c and recompiling/flashing untill
>> something works).
>>
>> One of my other units is configured with just a vlan device per switch
>> port, no wifi and no bridge. A third unit is configured with a wlan0,
>> vlan1 (port0) and vlan2 (ports 1,2,3,4). Both not showing any issues in
>> the past months. The only difference would be this problem-unit has a
>> bridge.
>>
>> Any thoughts on how to approach or 'just' fix this?
>>
>> Regards,
>> Harm
>> _______________________________________________
>> freebsd-mips@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-mips
>> To unsubscribe, send any mail to "freebsd-mips-unsubscribe@freebsd.org"
[-- Attachment #2 --]
# sysctl -a|grep interrupt
interrupt 0 count 2154400768
interrupt 1 count 2153943040
interrupt 2 count 0
interrupt 3 count 0
interrupt 4 count 2153893728
interrupt 5 count 2153893728
interrupt 6 count 2151296140
interrupt 7 count 2151296104
interrupt 8 count 16
interrupt 9 count 1048576
interrupt 10 count 0
interrupt 11 count 2432062003
interrupt 12 count 1366556672
interrupt 13 count 2154442496
interrupt 14 count 0
interrupt 15 count 2154442368
interrupt 16 count 2154441280
interrupt 17 count 0
interrupt 18 count 0
interrupt 19 count 0
interrupt 20 count 0
interrupt 21 count 0
interrupt 22 count 0
interrupt 23 count 0
interrupt 24 count 0
interrupt 25 count 2153896656
interrupt 26 count 16973824
interrupt 27 count 0
interrupt 28 count 0
interrupt 29 count 4
interrupt 30 count 0
interrupt 31 count 0
interrupt 0 count 2154400768
interrupt 1 count 2153943040
interrupt 2 count 0
interrupt 3 count 0
interrupt 4 count 2153893728
interrupt 5 count 2153893728
interrupt 6 count 2151296140
interrupt 7 count 2151296104
interrupt 8 count 16
interrupt 9 count 1048576
interrupt 10 count 0
interrupt 11 count 2432062003
interrupt 12 count 1366556672
interrupt 13 count 2154442496
interrupt 14 count 0
interrupt 15 count 2154442368
interrupt 16 count 2154441280
interrupt 17 count 0
interrupt 18 count 0
interrupt 19 count 0
interrupt 20 count 0
interrupt 21 count 0
interrupt 22 count 0
interrupt 23 count 0
interrupt 24 count 0
interrupt 25 count 2153896656
interrupt 26 count 16973824
interrupt 27 count 0
interrupt 28 count 0
interrupt 29 count 4
interrupt 30 count 0
interrupt 31 count 0
interrupt 0 count 2154424320
interrupt 1 count 2153942784
interrupt 2 count 0
interrupt 3 count 0
interrupt 4 count 2153893344
interrupt 5 count 2153893344
interrupt 6 count 2151296140
interrupt 7 count 2151296104
interrupt 8 count 6
interrupt 9 count 1048576
interrupt 10 count 0
interrupt 11 count 2432062003
interrupt 12 count 1366622208
interrupt 13 count 2154440320
interrupt 14 count 0
interrupt 15 count 2154442304
interrupt 16 count 2154439936
interrupt 17 count 0
interrupt 18 count 0
interrupt 19 count 0
interrupt 20 count 0
interrupt 21 count 0
interrupt 22 count 0
interrupt 23 count 0
interrupt 24 count 0
interrupt 25 count 2153896608
interrupt 26 count 16973824
interrupt 27 count 0
interrupt 28 count 0
interrupt 29 count 4
interrupt 30 count 0
interrupt 31 count 0
interrupt 0 count 2154424320
interrupt 1 count 2153942784
interrupt 2 count 0
interrupt 3 count 0
interrupt 4 count 2153893344
interrupt 5 count 2153893344
interrupt 6 count 2151296140
interrupt 7 count 2151296104
interrupt 8 count 6
interrupt 9 count 1048576
interrupt 10 count 0
interrupt 11 count 2432062003
interrupt 12 count 1366622208
interrupt 13 count 2154440320
interrupt 14 count 0
interrupt 15 count 2154442304
interrupt 16 count 2154439936
interrupt 17 count 0
interrupt 18 count 0
interrupt 19 count 0
interrupt 20 count 0
interrupt 21 count 0
interrupt 22 count 0
interrupt 23 count 0
interrupt 24 count 0
interrupt 25 count 2153896608
interrupt 26 count 16973824
interrupt 27 count 0
interrupt 28 count 0
interrupt 29 count 4
interrupt 30 count 0
interrupt 31 count 0
### after 9hrs, storm on int2 hits
interrupt storm detected on "int2"; throttling interrupt source
interrupt storm detected on "int2"; throttling interrupt source
interrupt storm detected on "int2"; throttling interrupt source
interrupt storm detected on "int2"; throttling interrupt source
# sysctl -a|grep interrupt
interrupt 0 count 2154400768
interrupt 1 count 2153943040
interrupt 2 count 0
interrupt 3 count 0
interrupt 4 count 2153893728
interrupt 5 count 2153893728
interrupt 6 count 2151296140
interrupt 7 count 2151296104
interrupt 8 count 16
interrupt 9 count 1048576
interrupt 10 count 0
interrupt 11 count 2432062003
interrupt 12 count 1366556672
interrupt 13 count 2154442496
interrupt 14 count 0
interrupt 15 count 2154442368
interrupt 16 count 2154441280
interrupt 17 count 0
interrupt 18 count 0
interrupt 19 count 0
interrupt 20 count 0
interrupt 21 count 0
interrupt 22 count 0
interrupt 23 count 0
interrupt 24 count 0
interrupt 25 count 2153896656
interrupt 26 count 16973824
interrupt 27 count 0
interrupt 28 count 0
interrupt 29 count 4
interrupt 30 count 0
interrupt 31 count 0
interrupt 0 count 2154400768
interrupt 1 count 2153943040
interrupt 2 count 0
interrupt 3 count 0
interrupt 4 count 2153893728
interrupt 5 count 2153893728
interrupt 6 count 2151296140
interrupt 7 count 2151296104
interrupt 8 count 16
interrupt 9 count 1048576
interrupt 10 count 0
interrupt 11 count 2432062003
interrupt 12 count 1366556672
interrupt 13 count 2154442496
interrupt 14 count 0
interrupt 15 count 2154442368
interrupt 16 count 2154441280
interrupt 17 count 0
interrupt 18 count 0
interrupt 19 count 0
interrupt 20 count 0
interrupt 21 count 0
interrupt 22 count 0
interrupt 23 count 0
interrupt 24 count 0
interrupt 25 count 2153896656
interrupt 26 count 16973824
interrupt 27 count 0
interrupt 28 count 0
interrupt 29 count 4
interrupt 30 count 0
interrupt 31 count 0
interrupt 0 count 2154424320
interrupt 1 count 2153942784
interrupt 2 count 0
interrupt 3 count 0
interrupt 4 count 2153893344
interrupt 5 count 2153893344
interrupt 6 count 2151296140
interrupt 7 count 2151296104
interrupt 8 count 6
interrupt 9 count 1048576
interrupt 10 count 0
interrupt 11 count 2432062003
interrupt 12 count 1366622208
interrupt 13 count 2154440320
interrupt 14 count 0
interrupt 15 count 2154442304
interrupt 16 count 2154439936
interrupt 17 count 0
interrupt 18 count 0
interrupt 19 count 0
interrupt 20 count 0
interrupt 21 count 0
interrupt 22 count 0
interrupt 23 count 0
interrupt 24 count 0
interrupt 25 count 2153896608
interrupt 26 count 16973824
interrupt 27 count 0
interrupt 28 count 0
interrupt 29 count 4
interrupt 30 count 0
interrupt 31 count 0
interrupt 0 count 2154424320
interrupt 1 count 2153942784
interrupt 2 count 0
interrupt 3 count 0
interrupt 4 count 2153893344
interrupt 5 count 2153893344
interrupt 6 count 2151296140
interrupt 7 count 2151296104
interrupt 8 count 6
interrupt 9 count 1048576
interrupt 10 count 0
interrupt 11 count 2432062003
interrupt 12 count 1366622208
interrupt 13 count 2154440320
interrupt 14 count 0
interrupt 15 count 2154442304
interrupt 16 count 2154439936
interrupt 17 count 0
interrupt 18 count 0
interrupt 19 count 0
interrupt 20 count 0
interrupt 21 count 0
interrupt 22 count 0
interrupt 23 count 0
interrupt 24 count 0
interrupt 25 count 2153896608
interrupt 26 count 16973824
interrupt 27 count 0
interrupt 28 count 0
interrupt 29 count 4
interrupt 30 count 0
interrupt 31 count 0
interrupt storm detected on "int2"; throttling interrupt source
interrupt storm detected on "int2"; throttling interrupt source
interrupt storm detected on "int2"; throttling interrupt source
interrupt storm detected on "int2"; throttling interrupt source
# ifconfig arge0 down
# vmstat -ia
interrupt total rate
sint0: 0 0
sint1: 0 0
int0 ath0 2938765 80
int1: 0 0
int2 arge0 335493 9
int3 arge1 0 0
int4 apb0 1936 0
int5 clock0 1129353 30
apb intr3: uart0 1936 0
apb intr2: gpio0 0 0
apb irq5: pmc 0 0
Total 4407483 120
[-- Attachment #3 --]
Index: sys/mips/atheros/if_arge.c
===================================================================
--- sys/mips/atheros/if_arge.c (revision 268881)
+++ sys/mips/atheros/if_arge.c (working copy)
@@ -265,6 +265,21 @@
return (BUS_PROBE_NOWILDCARD);
}
+/*
+ * Print a list of all interrupts with there associated count.
+ */
+static int
+sysctl_interrupt_status(SYSCTL_HANDLER_ARGS)
+{
+ uint32_t *intr_status = arg1;
+ int i;
+
+ for (i = 0; i < 32; i++) {
+ printf("interrupt %d count %u\n", i, intr_status[i]);
+ }
+ return (0);
+}
+
static void
arge_attach_sysctl(device_t dev)
{
@@ -293,6 +308,8 @@
CTLFLAG_RW, &sc->arge_cdata.arge_tx_cons, 0, "");
SYSCTL_ADD_UINT(ctx, SYSCTL_CHILDREN(tree), OID_AUTO, "tx_cnt",
CTLFLAG_RW, &sc->arge_cdata.arge_tx_cnt, 0, "");
+ SYSCTL_ADD_PROC(ctx, SYSCTL_CHILDREN(tree), OID_AUTO, "interrupt_status",
+ CTLTYPE_INT | CTLFLAG_RW, sc, 0, sysctl_interrupt_status, "I", "Detailed interrupt counters");
#endif
}
@@ -2272,6 +2289,8 @@
struct arge_softc *sc = arg;
uint32_t status;
struct ifnet *ifp = sc->arge_ifp;
+ uint32_t temp_status;
+ int i;
status = ARGE_READ(sc, AR71XX_DMA_INTR_STATUS);
status |= sc->arge_intr_status;
@@ -2286,6 +2305,17 @@
if (status == 0)
return;
+ /*
+ * Count interrupts.
+ */
+ temp_status = status;
+ for (i = 0; i < 32; i++) {
+ if (temp_status & 1) {
+ sc->intr_status[i]++;
+ }
+ temp_status = temp_status >> 1;
+ }
+
if (status & DMA_INTR_RX_BUS_ERROR) {
ARGE_WRITE(sc, AR71XX_DMA_RX_STATUS, DMA_RX_STATUS_BUS_ERROR);
device_printf(sc->arge_dev, "RX bus error");
Index: sys/mips/atheros/if_argevar.h
===================================================================
--- sys/mips/atheros/if_argevar.h (revision 268114)
+++ sys/mips/atheros/if_argevar.h (working copy)
@@ -176,6 +176,7 @@
uint32_t rx_overflow;
uint32_t tx_underflow;
} stats;
+ uint32_t intr_status[32];
};
#endif /* __IF_ARGEVAR_H__ */
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53D36005.1010200>
