From owner-freebsd-infiniband@freebsd.org Tue Jul 19 15:36:36 2016 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B302B9D555 for ; Tue, 19 Jul 2016 15:36:36 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B9F681D18 for ; Tue, 19 Jul 2016 15:36:35 +0000 (UTC) (envelope-from hps@selasky.org) Received: from laptop015.home.selasky.org (unknown [62.141.129.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id EB87E1FE024; Tue, 19 Jul 2016 17:36:32 +0200 (CEST) Subject: Re: Weird NFS client lock up with Mellanox cards :/ To: Justin Clift , freebsd-infiniband@freebsd.org References: <288AE8D3-9F16-453D-BD73-00672C4E2D94@postgresql.org> From: Hans Petter Selasky Message-ID: <6a5b530e-521c-47f7-5012-7512b2fa050c@selasky.org> Date: Tue, 19 Jul 2016 17:40:32 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.0 MIME-Version: 1.0 In-Reply-To: <288AE8D3-9F16-453D-BD73-00672C4E2D94@postgresql.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jul 2016 15:36:36 -0000 On 07/19/16 15:28, Justin Clift wrote: > Hi all, > > Brian Krusic (CC'd), has been kind enough to put time into some > performance testing of Mellanox ConnectX-3 Pro cards in 40GbE mode, > with FreeNAS 9.10-STABLE. (That uses FreeBSD 10-STABLE as it's > base OS) > > Weirdly, his NFS clients are locking up when using Mellanox cards, > but not with SolarFlare ones. > > https://bugs.freenas.org/issues/7659#note-40 > > Comparing the OFED code in FreeNAS 9.10-STABLE to FreeNSD 10-STABLE, > there's one patch difference. It's a recent one from 3 days ago: > > MFC r301877 > Add a missing error check for a malloc() call in idr_get(). > https://github.com/freebsd/freebsd/commit/03f3328da077d2def40be7dea8d13c74c2ccd447 > > Does anyone know if this missing patch could result in a slow down > of NFS clients (but not Samba/SMB)? Maybe memory leak style, leading > to a lack of resources or something? > > Hoping it's really this simple. But if not... does anyone have > suggestions on what to try for figuring this out? > > Regards and best wishes, > Hi, Might be some timing issue not related to the Mellanox cards. What link speeds is being used? What happens when the lockup happens? Is RSS being used? --HPS