Date: Wed, 24 Mar 2021 13:57:47 +0900 From: hiroshi matsuo <matsuo.hiroshi.39@gmail.com> To: freebsd-infiniband@freebsd.org Subject: Data corruption via IPoIB in connected mode between Linux and FreeBSD Message-ID: <CAGmx_cYqxYBb0XKcdWKHxZmRjKUt0n2CNqkrqwBxYFM0e6A_hQ@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Dear, I'm trying IPoIB between Linux and FreeBSD with Mellanox ConnectX-2 cards. Now I have a strange problem above my knowledge. CentOS-7: IP address=1.0.1.2/24 (attached to ipoib device), transport mode=connected MTU=65520 (following RedHat document) FreeBSD-12.2: IP address=1.0.1.1/24 (attached to ipoib device), transport mode=connected (I built with IPOIB_CM options) MTU=4092 (default? I want this set 65520 to be the same as CentOS, but I can not. mlx4_core0: 65520 is invalid IBTA mtu dmesg shows. Why?) FreeBSD box has a 2TB ZFS pool and there are about 4,000,000 files in it. A few days ago I copied all files from FreeBSD to CentOS by rsync like this: centos$ rsync -av -e ssh matsuo@10.0.1.1:/tank/data/ ~/data At one time I found a corrupted file accidentally, however rsync finished with no error message. I have looked into all files and compared between copies and originals. At last I understand that: 1. There are 24 corrupted files (MD5 value is different from original) i.e. 0.0006% failure, 99.9994% success 2. Every corrupted file has just one byte which is different from original and the position of the error byte seems random. So not a burst error. I doubt whether CM is established but I don't know the way to inspect it deeply. Please point out to me what is the root cause what is wrong about my setup document worth reading first and so on. In addition I had iperf tests. 16Gbps (in CentOS-CentOS case) 4Gbps (in CentOS-FreeBSD case) So I think My FreeBSD server does not work properly and something wrong. Thank you. Hiroshi Matsuo
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGmx_cYqxYBb0XKcdWKHxZmRjKUt0n2CNqkrqwBxYFM0e6A_hQ>