Date: Thu, 6 Sep 2018 09:35:47 -0700 From: John-Mark Gurney <jmg@funkthat.com> To: arm@FreeBSD.org Subject: Allwinner awg TX hanging issue Message-ID: <20180906163547.GC75530@funkthat.com>
next in thread | raw e-mail | index | archive | help
Since I upgraded to a recent -current to fix the timer issue on my A64-LTS board, I've been having an issue where the ethernet interface will freeze. This is with: FreeBSD gate2.funkthat.com 12.0-ALPHA4 FreeBSD 12.0-ALPHA4 #4 r338426M: Wed Sep 5 09:55:12 PDT 2018 root@gate2.funkthat.com:/usr/src/sys/arm64/compile/GENERIC arm64 The modified code is simply to add some dtrace probe points to debug this issue. I also dropped the check for _OACTIVE from _start_locked. It prints flag at the begining of _start_locked and if _OACTIVE gets set and at the end of _txeof if progress was made. It also prints the progress at the end of txeof if any... It prints the val of _intr.. I noticed that when it was hung, the OACTIVE flag was set, but this just means that we ran out of transmit descriptors, and was a symptom of the problem. I don't have a good test to trigger this problem. This happens somewhat regularly, every 4-12 hours on my router, but my test board, which is lightly loaded and does not run pf doesn't have this issue. With the added dtrace probe points, I finally hit this: 3 10115 none:intr intr 40000024 3 10115 none:intr intr 40000100 3 10115 none:intr intr 40000100 3 10115 none:intr intr 40000100 3 10115 none:intr intr 40000100 3 10114 none:flags flag 40 3 10115 none:intr intr 40000024 3 10115 none:intr intr 40000100 3 10114 none:flags flag 40 3 10115 none:intr intr 40000024 3 10115 none:intr intr 40000100 3 10114 none:flags flag 40 3 10115 none:intr intr 40000024 3 10115 none:intr intr 40000100 3 10114 none:flags flag 40 3 10115 none:intr intr 40000100 3 10115 none:intr intr 40000024 3 10115 none:intr intr 4000010a 3 10114 none:flags flag 40 3 10115 none:intr intr 40000100 3 10114 none:flags flag 40 3 10115 none:intr intr 40000100 3 10114 none:flags flag 40 [...] 3 10115 none:intr intr 40000100 3 10114 none:flags flag 40 3 10114 none:flags flag 440 3 10115 none:intr intr 40000100 3 10115 none:intr intr 40000100 3 10115 none:intr intr 40000100 The intr 24 line is a normal interrupt, and will run txeof to free up descriptors. The intr 100 line is saying the RGMII link status changed, we don't enable it, so I'm not sure why we are getting these interrupts (it seems like the enable bit is ignored). These are normal, and see these lines for a long while. The flag 440 line is when we set OACTIVE, and then we see no more flag 40 lines, which means that _start_locked doesn't get called and that _txeof doesn't make forward progress. The problem point is the intr 10a line. Once we hit that line, we never get another intr 24 line. The a is the important part of the inter status, as it is: 0x8 TX_TIMEOUT_INT When this bit is asserted, the transmitter had been excessively active. and: 0x2 TX_DMA_STOPPED_INT When this bit is asserted, the TX DMA FSM is stopped. We do not have code in the awg driver to recover from this problem. Does anyone have any ideas? Thanks. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180906163547.GC75530>