Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Jul 2014 22:29:18 +0200
From:      Harm Weites <harm@weites.com>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        "freebsd-mips@freebsd.org" <freebsd-mips@freebsd.org>
Subject:   Re: interrupt storm arge0, tplink 1043nd
Message-ID:  <53D40F9E.6020409@weites.com>
In-Reply-To: <CAJ-VmomG7ZfJMdnU8DM5qiodR-BtPbjCXtVp2jXo9K6aAKzuPg@mail.gmail.com>
References:  <53CEB6B1.9050301@weites.com> <CAJ-VmomG7ZfJMdnU8DM5qiodR-BtPbjCXtVp2jXo9K6aAKzuPg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Oops, ofcourse it didn't work... After passing the correct argument
(&sc->intr_status, instead of sc) I got answers.

These are the results of three times sysctl, producing 4 lines per run
(presumably 2 lines arge0 and 2 lines for the dumb arge1). First run
took place after boot, second a while after that and third just after
the storm.

interrupt 1 count 135
interrupt 1 count 135
interrupt 1 count 0
interrupt 1 count 0
interrupt 1 count 4738
interrupt 1 count 4738
interrupt 1 count 0
interrupt 1 count 0
interrupt 1 count 5041
interrupt 1 count 5041
interrupt 1 count 0
interrupt 1 count 0

interrupt 4 count 108
interrupt 4 count 108
interrupt 4 count 0
interrupt 4 count 0
interrupt 4 count 15843
interrupt 4 count 15844
interrupt 4 count 0
interrupt 4 count 0
interrupt 4 count 35311
interrupt 4 count 35311
interrupt 4 count 0
interrupt 4 count 0

interrupt 6 count 0
interrupt 6 count 0
interrupt 6 count 0
interrupt 6 count 0
interrupt 6 count 4
interrupt 6 count 4
interrupt 6 count 0
interrupt 6 count 0
interrupt 6 count 11
interrupt 6 count 11
interrupt 6 count 0
interrupt 6 count 0

Interrupt 4 went up rather quick, so that likely is the bad guy. Right?

Regards,
Harm

op 22-07-14 21:26, Adrian Chadd schreef:
> Hi!
>
> So, ignore the ath0 stuff for now. int2 should be arge0, right?
>
> what's vmstat -ia say?
>
> Assuming it's actually arge0, we need to add some debugging counters
> to the interrupt path to count how many of each interrupt are
> occuring. The stuff i stuck behind ARGEDEBUG() is useful for debugging
> some silly bugs but not at the rate that you're getting interrupts.
>
> So I'd add something like this to the arge softc struct:
>
> uint32_t intr_status[32];
>
> .. then in the interrupt routine, something like this:
>
> temp_status = status;
> for (i = 0; i < 32; i++) {
>     if (temp_status & 1) {
>         intr_status[i]++;
>     }
>     temp_status = temp_status >> 1;
> }
>
> That'll count the number of interrupts that are firing for each
> interrupt status bit.
>
> Then, you'll want to write a sysctl for it. Have a look at
> if_ath_sysctl.c for the SYSCTL_PROC() entries. Just write one that
> when called will just printf() the intr_status array:
>
> for (i = 0; i < 32; i++) {
>     printf("interrupt %d count %u\n", i, intr_status[i]);
> }
>
> Just make sure you do a complete kernel recompile as changing the
> headers doesn't always force the source files to recompile.
>
>
> -a
>
>
> On 22 July 2014 12:08, Harm Weites <harm@weites.com> wrote:
>> Hi,
>>
>> My 1043nd is complaining about interrupt storms, presumably only when
>> wifi is beeing used. When this occurs, networking is gone.
>>
>> The exact message thats flooding me:
>>     interrupt storm detected on "int2"; throttling interrupt source
>>
>> The device associated with int2 is arge0.
>>
>> Some possibly related logs, though these messages start at boot:
>>
>>     # /sbin/dmesg | tail
>>     ath0: stuck beacon; resetting (bmiss count 4)
>>     ar5416StopDmaReceive: dma failed to stop in 10ms
>>     AR_CR=0x00000024
>>     AR_DIAG_SW=0x42000020
>>     MBSSID Set bit 22 of AR_STA_ID 0xb8c16866
>>     ath0: stuck beacon; resetting (bmiss count 4)
>>     ar5416StopDmaReceive: dma failed to stop in 10ms
>>     AR_CR=0x00000024
>>     AR_DIAG_SW=0x42000020
>>     MBSSID Set bit 22 of AR_STA_ID 0xb8c16866
>>
>> This unit is configured with (arge0) port0 bound to device vlan1, port4
>> to vlan2 and ports 1,2,3 make up vlan3. There is wlan0, bound to ath0
>> and a bridge device that connects wlan0 to vlan3. There is a dhcp server
>> running in vlan3 to answer to wifi clients, internet is routed through
>> vlan1. This initially works but after a little while the storm begins
>> and the wifi client is left to die.
>>
>> Adrian@ suggested to start with reading which interrupt(s) occur(s), but
>> that is perhaps a little to hard for me to code :) Looking at if_arge.c,
>> it seems there is some debug code already in place (ARGEDEBUG()) though
>> I'm not sure on how to use that. Reading from the AR71XX_DMA_INTR
>> register and mapping its content to AR71XX_DMA_INTR_STATUS would be
>> something I'd like to do with a separate program (instead of boldly
>> taking a deepdive in to if_arge.c and recompiling/flashing untill
>> something works).
>>
>> One of my other units is configured with just a vlan device per switch
>> port, no wifi and no bridge. A third unit is configured with a wlan0,
>> vlan1 (port0) and vlan2 (ports 1,2,3,4). Both not showing any issues in
>> the past months. The only difference would be this problem-unit has a
>> bridge.
>>
>> Any thoughts on how to approach or 'just' fix this?
>>
>> Regards,
>> Harm
>> _______________________________________________
>> freebsd-mips@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-mips
>> To unsubscribe, send any mail to "freebsd-mips-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53D40F9E.6020409>