Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Jul 2014 14:38:30 -0700
From:      Adrian Chadd <adrian@freebsd.org>
To:        Harm Weites <harm@weites.com>
Cc:        "freebsd-mips@freebsd.org" <freebsd-mips@freebsd.org>
Subject:   Re: interrupt storm arge0, tplink 1043nd
Message-ID:  <CAJ-Vmo=B53Zogg92w7-jiecn7sX=rtmOQPC3h5sDCh5T3ogoVw@mail.gmail.com>
In-Reply-To: <53D40F9E.6020409@weites.com>
References:  <53CEB6B1.9050301@weites.com> <CAJ-VmomG7ZfJMdnU8DM5qiodR-BtPbjCXtVp2jXo9K6aAKzuPg@mail.gmail.com> <53D40F9E.6020409@weites.com>

next in thread | previous in thread | raw e-mail | index | archive | help
So those interrupts are:

ar71xxreg.h:#define AR71XX_DMA_INTR 0x198
ar71xxreg.h:#define AR71XX_DMA_INTR_STATUS 0x19C
ar71xxreg.h:#define DMA_INTR_ALL ((1 << 8) - 1)
ar71xxreg.h:#define DMA_INTR_RX_BUS_ERROR (1 << 7)
ar71xxreg.h:#define DMA_INTR_RX_OVERFLOW (1 << 6)
ar71xxreg.h:#define DMA_INTR_RX_PKT_RCVD (1 << 4)
ar71xxreg.h:#define DMA_INTR_TX_BUS_ERROR (1 << 3)
ar71xxreg.h:#define DMA_INTR_TX_UNDERRUN (1 << 1)
ar71xxreg.h:#define DMA_INTR_TX_PKT_SENT (1 << 0)

.. so interrupt bit 4 is packet received.

So yeah, it going up is quite expected. but is it triggering the
storm? I'm not sure.

So the next thing is figuring out if this s causing the storm logic to
fire or not.


I'l go digging. Thanks!


-a


On 26 July 2014 13:29, Harm Weites <harm@weites.com> wrote:
> Oops, ofcourse it didn't work... After passing the correct argument
> (&sc->intr_status, instead of sc) I got answers.
>
> These are the results of three times sysctl, producing 4 lines per run
> (presumably 2 lines arge0 and 2 lines for the dumb arge1). First run
> took place after boot, second a while after that and third just after
> the storm.
>
> interrupt 1 count 135
> interrupt 1 count 135
> interrupt 1 count 0
> interrupt 1 count 0
> interrupt 1 count 4738
> interrupt 1 count 4738
> interrupt 1 count 0
> interrupt 1 count 0
> interrupt 1 count 5041
> interrupt 1 count 5041
> interrupt 1 count 0
> interrupt 1 count 0
>
> interrupt 4 count 108
> interrupt 4 count 108
> interrupt 4 count 0
> interrupt 4 count 0
> interrupt 4 count 15843
> interrupt 4 count 15844
> interrupt 4 count 0
> interrupt 4 count 0
> interrupt 4 count 35311
> interrupt 4 count 35311
> interrupt 4 count 0
> interrupt 4 count 0
>
> interrupt 6 count 0
> interrupt 6 count 0
> interrupt 6 count 0
> interrupt 6 count 0
> interrupt 6 count 4
> interrupt 6 count 4
> interrupt 6 count 0
> interrupt 6 count 0
> interrupt 6 count 11
> interrupt 6 count 11
> interrupt 6 count 0
> interrupt 6 count 0
>
> Interrupt 4 went up rather quick, so that likely is the bad guy. Right?
>
> Regards,
> Harm
>
> op 22-07-14 21:26, Adrian Chadd schreef:
>> Hi!
>>
>> So, ignore the ath0 stuff for now. int2 should be arge0, right?
>>
>> what's vmstat -ia say?
>>
>> Assuming it's actually arge0, we need to add some debugging counters
>> to the interrupt path to count how many of each interrupt are
>> occuring. The stuff i stuck behind ARGEDEBUG() is useful for debugging
>> some silly bugs but not at the rate that you're getting interrupts.
>>
>> So I'd add something like this to the arge softc struct:
>>
>> uint32_t intr_status[32];
>>
>> .. then in the interrupt routine, something like this:
>>
>> temp_status = status;
>> for (i = 0; i < 32; i++) {
>>     if (temp_status & 1) {
>>         intr_status[i]++;
>>     }
>>     temp_status = temp_status >> 1;
>> }
>>
>> That'll count the number of interrupts that are firing for each
>> interrupt status bit.
>>
>> Then, you'll want to write a sysctl for it. Have a look at
>> if_ath_sysctl.c for the SYSCTL_PROC() entries. Just write one that
>> when called will just printf() the intr_status array:
>>
>> for (i = 0; i < 32; i++) {
>>     printf("interrupt %d count %u\n", i, intr_status[i]);
>> }
>>
>> Just make sure you do a complete kernel recompile as changing the
>> headers doesn't always force the source files to recompile.
>>
>>
>> -a
>>
>>
>> On 22 July 2014 12:08, Harm Weites <harm@weites.com> wrote:
>>> Hi,
>>>
>>> My 1043nd is complaining about interrupt storms, presumably only when
>>> wifi is beeing used. When this occurs, networking is gone.
>>>
>>> The exact message thats flooding me:
>>>     interrupt storm detected on "int2"; throttling interrupt source
>>>
>>> The device associated with int2 is arge0.
>>>
>>> Some possibly related logs, though these messages start at boot:
>>>
>>>     # /sbin/dmesg | tail
>>>     ath0: stuck beacon; resetting (bmiss count 4)
>>>     ar5416StopDmaReceive: dma failed to stop in 10ms
>>>     AR_CR=0x00000024
>>>     AR_DIAG_SW=0x42000020
>>>     MBSSID Set bit 22 of AR_STA_ID 0xb8c16866
>>>     ath0: stuck beacon; resetting (bmiss count 4)
>>>     ar5416StopDmaReceive: dma failed to stop in 10ms
>>>     AR_CR=0x00000024
>>>     AR_DIAG_SW=0x42000020
>>>     MBSSID Set bit 22 of AR_STA_ID 0xb8c16866
>>>
>>> This unit is configured with (arge0) port0 bound to device vlan1, port4
>>> to vlan2 and ports 1,2,3 make up vlan3. There is wlan0, bound to ath0
>>> and a bridge device that connects wlan0 to vlan3. There is a dhcp server
>>> running in vlan3 to answer to wifi clients, internet is routed through
>>> vlan1. This initially works but after a little while the storm begins
>>> and the wifi client is left to die.
>>>
>>> Adrian@ suggested to start with reading which interrupt(s) occur(s), but
>>> that is perhaps a little to hard for me to code :) Looking at if_arge.c,
>>> it seems there is some debug code already in place (ARGEDEBUG()) though
>>> I'm not sure on how to use that. Reading from the AR71XX_DMA_INTR
>>> register and mapping its content to AR71XX_DMA_INTR_STATUS would be
>>> something I'd like to do with a separate program (instead of boldly
>>> taking a deepdive in to if_arge.c and recompiling/flashing untill
>>> something works).
>>>
>>> One of my other units is configured with just a vlan device per switch
>>> port, no wifi and no bridge. A third unit is configured with a wlan0,
>>> vlan1 (port0) and vlan2 (ports 1,2,3,4). Both not showing any issues in
>>> the past months. The only difference would be this problem-unit has a
>>> bridge.
>>>
>>> Any thoughts on how to approach or 'just' fix this?
>>>
>>> Regards,
>>> Harm
>>> _______________________________________________
>>> freebsd-mips@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-mips
>>> To unsubscribe, send any mail to "freebsd-mips-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=B53Zogg92w7-jiecn7sX=rtmOQPC3h5sDCh5T3ogoVw>