Date: Sat, 13 Apr 2019 13:41:12 -0500 From: Jason Bacon <bacon4000@gmail.com> To: Justin Clift <justin@postgresql.org> Cc: Hans Petter Selasky <hps@selasky.org>, freebsd-infiniband@freebsd.org Subject: Re: Kernel modules Message-ID: <236a3839-e880-ab17-146a-4521d1894813@gmail.com> In-Reply-To: <b6e6f8931f59fb2ecf985478ea4d77b7@postgresql.org> References: <0eba9ec9-692f-7677-2b10-4e67a232821c@gmail.com> <f3f94452-155f-79f4-72d8-bf65760ae5b0@selasky.org> <598a58f0-89b8-d00d-5ed7-74dd7005950f@gmail.com> <73ce0738-4d63-2f25-2ff6-00f0092de136@selasky.org> <2090dd24-db43-b689-4289-f50bd70090ea@gmail.com> <6673df26-8bba-ebd3-b2c5-d7e9c97db557@gmail.com> <d82f3a60-6ad4-dba8-a15b-355a536a9a83@gmail.com> <bd42597e-2981-4667-468e-b008b9be290b@selasky.org> <2f4d9a14-4ff6-0d34-06f0-bbb4ac76c6bd@gmail.com> <5166ec29-876b-0bd3-8a84-8a222647e87a@gmail.com> <b6e6f8931f59fb2ecf985478ea4d77b7@postgresql.org>
index | next in thread | previous in thread | raw e-mail
On 2019-04-13 13:29, Justin Clift wrote: > On 2019-04-13 23:52, Jason Bacon wrote: > <snip> >> Stability will take a long time to test properly. I'm going to start >> by rerunning some of our most I/O-intensive jobs on it - jobs that >> actually broke our CentOS RAID servers until I switched them to NFS >> over RDMA. > > That's got to be the first time anyone's ever mentioned "NFS over > RDMA" as > increasing a systems' stability. :) > > + Justin Believe it or not... ;-) After my upgrade from CentOS 6 to CentOS 7, NFS over TCP started falling apart under heavy load; servers and compute nodes becoming unresponsive and requiring a reboot to restore stability. If it's due to problems in the CentOS TCP stack, NFS over RDMA would help by eliminating the TCP stack from the pathway. One one cluster (old qlogic HCAs), setting net.core.netdev_budget=2000 seems to have solved the issue. On the other (newer Mellanox FDR HCAs), it did not seem to help, so I tried RDMA and it's been stable ever since. Down side is we can no longer monitor traffic with iftop... -- Earth is a beta site.home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?236a3839-e880-ab17-146a-4521d1894813>
