Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Mar 2021 08:53:23 +0100
From:      Hans Petter Selasky <hps@selasky.org>
To:        hiroshi matsuo <matsuo.hiroshi.39@gmail.com>, freebsd-infiniband@freebsd.org
Subject:   Re: Data corruption via IPoIB in connected mode between Linux and FreeBSD
Message-ID:  <68a1d236-45d9-5343-5ad6-9fee877f5ff1@selasky.org>
In-Reply-To: <CAGmx_cYqxYBb0XKcdWKHxZmRjKUt0n2CNqkrqwBxYFM0e6A_hQ@mail.gmail.com>
References:  <CAGmx_cYqxYBb0XKcdWKHxZmRjKUt0n2CNqkrqwBxYFM0e6A_hQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Hiroshi,

On 3/24/21 5:57 AM, hiroshi matsuo wrote:
> Dear,
> 
> I'm trying IPoIB between Linux and FreeBSD with Mellanox ConnectX-2 cards.
> Now I have a strange problem above my knowledge.
> 
> CentOS-7:
> IP address=1.0.1.2/24 (attached to ipoib device),
> transport mode=connected
> MTU=65520 (following RedHat document)
> 
> FreeBSD-12.2:
> IP address=1.0.1.1/24 (attached to ipoib device),
> transport mode=connected (I built with IPOIB_CM options)
> MTU=4092  (default?  I want this set 65520 to be the same as CentOS, but I
> can not.
>   mlx4_core0: 65520
>    is invalid IBTA mtu
> dmesg shows. Why?)

Did you apply my patch to FreeBSD 12.2 for connected mode?

Try to subtract the size of the infiniband address, 20 bytes.

Try setting the MTU to 9000 instead. Using such a large MTU with FreeBSD 
doesn't make sense.

Are you using an infiniband router between? If yes, has this been 
configured to handle this big MTU?

> 
> FreeBSD box has a 2TB ZFS pool and there are about 4,000,000 files in it.
> A few days ago I copied all files from FreeBSD to CentOS by rsync
> like this:
>    centos$ rsync -av -e ssh matsuo@10.0.1.1:/tank/data/  ~/data
> 
> At one time I found a corrupted file accidentally, however rsync finished
> with no error message.
> I have looked into all files and compared between copies and originals.  At
> last
> I understand that:
>    1. There are 24 corrupted files (MD5 value is different from original)
>           i.e.  0.0006% failure, 99.9994%  success
>    2. Every corrupted file has just one byte which is different from original
>       and the position of the error byte seems random. So not a burst error.
> 
> I doubt  whether CM is established but I don't know the way to inspect it
> deeply.

If you have IPOIB_CM set, you should be good.

> 
> Please point out to me
>   what is the root cause
>   what is wrong about my setup
>   document worth reading first
> and so on.

Like said, there is a bug in IPoIB CM mode, that is not fixed unless you 
apply the patch I sent, which Mellanox will upstream later.

> 
> In addition I had iperf tests.
>     16Gbps   (in CentOS-CentOS case)
>      4Gbps   (in CentOS-FreeBSD case)
> So I think My FreeBSD server does not work properly and something wrong.

--HPS



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?68a1d236-45d9-5343-5ad6-9fee877f5ff1>