Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 May 2018 15:43:37 +0900
From:      YongHyeon PYUN <pyunyh@gmail.com>
To:        Dieter BSD <dieterbsd@gmail.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: AX88179 USB-to-Ethernet is slow and silently corrupts data
Message-ID:  <20180506064337.GA1792@michelle.fasterthan.co.kr>
In-Reply-To: <CAA3ZYrAtvVxx1Ub22_mKBhV0T0YJJNLcBjNq0L7JfXNFOb3d5g@mail.gmail.com>
References:  <CAA3ZYrAtvVxx1Ub22_mKBhV0T0YJJNLcBjNq0L7JfXNFOb3d5g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 03, 2018 at 02:11:24PM -0700, Dieter BSD wrote:
> > 10.3-RELEASE

[...]

> pyunyh>  Which phy driver is used for axge(4)?
> pyunyh>  You can see the phy driver name below axge(4) attachment in dmesg
> pyunyh>  output.
> 
> axge0: <NetworkInterface> on usbus2
> axge1: <NetworkInterface> on usbus0
> miibus4: <MII bus>
>  on axge0
> miibus5: <MII bus> on axge1
> 

It's not phy driver name.  The phy driver may have shown right
after the miibus(4) output.  Probably the phy driver name would be
rgephy(4). 

> - Do you use manual media configuration instead of auto-negotiation?
> They auto configure at 1000. I usually set them to 100 which seems to
> eliminate the silent data corruption.
> 

Good data point.

> pyunyh> Does the issue happen at which media speed(10Mbps, 100Mbps or
>   1000Mbs)?
> 
> The silent data corruption happens at 1000.  100 seems to eliminate
> the data corruption but 100 isn't always fast enough.  I haven't tried
> setting the AX88179 to 10 Mbps mode, although I tested it by sending
> data to it from another machine whioh was running at 10 Mbps, with
> a Netgear switch converting the 10 Mbps to 1000 Mbps.  Using usb2 instead
> of usb3 also seems to eliminate the date corruption.  The AX88179
> doesn't seem to care about what Ethernet speed it is running at, or
> what usb speed it is running at.  The silent data corruption happens
> if it receives too many packets per second from the Ethernet.  Reducing
> Ethernet speed or usb speed are simple ways to reduce how many packets
> per second it handles.
> 

It seems this is another data point.  If you use ehci(4)(i.e. USB2)
the issue does not happen even on 1000base-T link, right?

> pyunyh> Which direction of packet flow is broken(TX or RX or both)?
> 
> The silent data corruption happens if it receives too many packets
> per second from the Ethernet.
> I have not observed any data corruption when the AX88179 transmits
> data to the Ethernet.  Tested with rcp(1).
> 

Ok, let's focus on RX side.

> It seems interesting that it is the receive direction that gets data
> corruption and the receive direction that fails completely when the
> rxcsum is turned off.  Perhaps related?
> 

If S/W checksum is used, you wouldn't receive corrupted packets so
your transfer operation is aborted in the middle of transfer and
you already know that operation was failed.  Silent data corruption
means you think your transfer was successful but actual content was
corrupted such that you can only find it after verifying md5 or
sha256 checksums of the content.  Are you seeing silent data
corruption with TCP transfer(You should not use nc(1) with UDP to
verify this.)?

> ue0 is now connected to chipset (AMD 990FX SB950) usb controller  usb2
> 
> ifconfig ue0 -rxcsum
> 
> ue0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
>         options=8000a<TXCSUM,VLAN_MTU,LINKSTATE>
>         media: Ethernet 100baseTX <full-duplex>
> 
> Sent ue0 a bunch of udp packets from another FreeBSD box.
> dd if=/dev/zero bs=1k count=50 | nc -4u 10.0.210.66 55555
> 
>             input            ue0           output
>    packets  errs idrops      bytes    packets  errs      bytes colls drops
>          0     0     0          0          0     0          0     0     0
>          0     0     0          0          0     0          0     0     0
>         50     0     0      53000          0     0          0     0     0
>          0     0     0          0          0     0          0     0     0
> 
> Receiving process got none of them.
> nc -4nul 55555 > /var/tmp/file_via_udp_ue0
> (file is zero bytes)
> 
> netstat -s -p udp
> udp:
>         0 datagrams received
>         0 with incomplete header
>         0 with bad data length field
>         0 with bad checksum
>         0 with no checksum
>         0 dropped due to no socket
>         0 broadcast/multicast datagrams undelivered
>         0 dropped due to full socket buffers
>         0 not for hashed pcb
>         0 delivered
>         0 datagrams output
>         0 times multicast source filter matched
> 
> So netstat sees packets coming in, but does not see any datagrams.
> Where is the data disappearing?  In the hardware?  In the device driver?

As I said other mail, netstat(1)'s raw packet counters are
maintained in driver.  Driver may have submitted packets to upper
layer but it seems they were discarded due to other reasons.

> Is "ifconfig -rxcsum" really doing the correct thing to the chip?
> 

If it was correctly implemented, yes.

> Is there some way to have RXCSUM,TXCSUM turned on, but also have the cpu
> verify the checksum?  I realize the the whole point of RXCSUM,TXCSUM

You have to choose only one(either hardware checksum offloading or
software checksum) so you can't have both.

> is to reduce the load on the cpu, but data corruption sucks.
> 
> To see if a different usb controller made any difference, I ran the same
> test using ue1 = Tek Republic TUN-300 which has the same AX88179 as the Siig,
> connected to onboard VIA VL805 USB 3.0 controller,  and it acts exactly the
> same as the Siig.  So no difference seen between the two usb controllers.

How about other USB host controllers? Does it also happen on
non-VIA USB 3.0 controllers?

> Both ue0 & ue1 are running at 100baseTX <full-duplex> which appears
> to eliminate the silent data corruption seen with rcp(1) when they
> are running at 1000.
> 
> I also tried the same test with tcp instead of udp.  Same results,
> zero length file and no checksum errors reported by netstat -s.
> So the protocol doesn't matter. "ifconfig -rxcsum" appears
> to stop the flow of all incoming packets, icmp. udp, and tcp.

I understand your frustration but it seems your data points are
mixed with other results so it seems it's not easy to narrow down
the issue.  I guess it would be better to do some very basic test
first instead of mixing all other things altogether.
- Disable TX/RX checksum offloading and never change it again.
- Stick to auto-negotiation, don't perform manual media
  configuration during test.
- ssh to a box with ue device, does it work?
- If ssh works, try sending a large file to the box with scp. Are
  you able to transfer the file?
- Check TCP statistics of netstat(1) after ssh/scp. Can you see
  TCP statistics?



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180506064337.GA1792>