From owner-freebsd-infiniband@freebsd.org Tue Jul 19 13:28:21 2016 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4704CB9DFAA for ; Tue, 19 Jul 2016 13:28:21 +0000 (UTC) (envelope-from justin@postgresql.org) Received: from meldrar.postgresql.org (meldrar.postgresql.org [IPv6:2a02:c0:301:0:ffff::31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "*.postgresql.org", Issuer "Gandi Standard SSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0E2CF1F1E for ; Tue, 19 Jul 2016 13:28:20 +0000 (UTC) (envelope-from justin@postgresql.org) Received: from 82-69-92-196.dsl.in-addr.zen.co.uk ([82.69.92.196] helo=[172.16.1.14]) by meldrar.postgresql.org with esmtpsa (TLS1.0:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bPV4I-0006Sb-ST; Tue, 19 Jul 2016 13:28:17 +0000 From: Justin Clift Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: Weird NFS client lock up with Mellanox cards :/ Date: Tue, 19 Jul 2016 14:28:12 +0100 Message-Id: <288AE8D3-9F16-453D-BD73-00672C4E2D94@postgresql.org> To: freebsd-infiniband@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-Mailer: Apple Mail (2.1878.6) X-Pg-Spam-Score: -2.9 (--) X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jul 2016 13:28:21 -0000 Hi all, Brian Krusic (CC'd), has been kind enough to put time into some performance testing of Mellanox ConnectX-3 Pro cards in 40GbE mode, with FreeNAS 9.10-STABLE. (That uses FreeBSD 10-STABLE as it's base OS) Weirdly, his NFS clients are locking up when using Mellanox cards, but not with SolarFlare ones. https://bugs.freenas.org/issues/7659#note-40 Comparing the OFED code in FreeNAS 9.10-STABLE to FreeNSD 10-STABLE, there's one patch difference. It's a recent one from 3 days ago: MFC r301877 Add a missing error check for a malloc() call in idr_get(). = https://github.com/freebsd/freebsd/commit/03f3328da077d2def40be7dea8d13c74= c2ccd447 Does anyone know if this missing patch could result in a slow down of NFS clients (but not Samba/SMB)? Maybe memory leak style, leading to a lack of resources or something? Hoping it's really this simple. But if not... does anyone have suggestions on what to try for figuring this out? Regards and best wishes, Justin Clift -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi From owner-freebsd-infiniband@freebsd.org Tue Jul 19 15:36:36 2016 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B302B9D555 for ; Tue, 19 Jul 2016 15:36:36 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B9F681D18 for ; Tue, 19 Jul 2016 15:36:35 +0000 (UTC) (envelope-from hps@selasky.org) Received: from laptop015.home.selasky.org (unknown [62.141.129.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id EB87E1FE024; Tue, 19 Jul 2016 17:36:32 +0200 (CEST) Subject: Re: Weird NFS client lock up with Mellanox cards :/ To: Justin Clift , freebsd-infiniband@freebsd.org References: <288AE8D3-9F16-453D-BD73-00672C4E2D94@postgresql.org> From: Hans Petter Selasky Message-ID: <6a5b530e-521c-47f7-5012-7512b2fa050c@selasky.org> Date: Tue, 19 Jul 2016 17:40:32 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.0 MIME-Version: 1.0 In-Reply-To: <288AE8D3-9F16-453D-BD73-00672C4E2D94@postgresql.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jul 2016 15:36:36 -0000 On 07/19/16 15:28, Justin Clift wrote: > Hi all, > > Brian Krusic (CC'd), has been kind enough to put time into some > performance testing of Mellanox ConnectX-3 Pro cards in 40GbE mode, > with FreeNAS 9.10-STABLE. (That uses FreeBSD 10-STABLE as it's > base OS) > > Weirdly, his NFS clients are locking up when using Mellanox cards, > but not with SolarFlare ones. > > https://bugs.freenas.org/issues/7659#note-40 > > Comparing the OFED code in FreeNAS 9.10-STABLE to FreeNSD 10-STABLE, > there's one patch difference. It's a recent one from 3 days ago: > > MFC r301877 > Add a missing error check for a malloc() call in idr_get(). > https://github.com/freebsd/freebsd/commit/03f3328da077d2def40be7dea8d13c74c2ccd447 > > Does anyone know if this missing patch could result in a slow down > of NFS clients (but not Samba/SMB)? Maybe memory leak style, leading > to a lack of resources or something? > > Hoping it's really this simple. But if not... does anyone have > suggestions on what to try for figuring this out? > > Regards and best wishes, > Hi, Might be some timing issue not related to the Mellanox cards. What link speeds is being used? What happens when the lockup happens? Is RSS being used? --HPS From owner-freebsd-infiniband@freebsd.org Tue Jul 19 17:26:21 2016 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BDF36B9E848 for ; Tue, 19 Jul 2016 17:26:21 +0000 (UTC) (envelope-from justin@postgresql.org) Received: from meldrar.postgresql.org (meldrar.postgresql.org [IPv6:2a02:c0:301:0:ffff::31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "*.postgresql.org", Issuer "Gandi Standard SSL CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7120F17EA for ; Tue, 19 Jul 2016 17:26:21 +0000 (UTC) (envelope-from justin@postgresql.org) Received: from 82-69-92-196.dsl.in-addr.zen.co.uk ([82.69.92.196] helo=[172.16.1.14]) by meldrar.postgresql.org with esmtpsa (TLS1.0:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1bPYmb-0004jh-KJ; Tue, 19 Jul 2016 17:26:16 +0000 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Weird NFS client lock up with Mellanox cards :/ From: Justin Clift In-Reply-To: Date: Tue, 19 Jul 2016 18:26:09 +0100 Cc: Hans Petter Selasky , freebsd-infiniband@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <288AE8D3-9F16-453D-BD73-00672C4E2D94@postgresql.org> <6a5b530e-521c-47f7-5012-7512b2fa050c@selasky.org> To: Brian Krusic X-Mailer: Apple Mail (2.1878.6) X-Pg-Spam-Score: -2.9 (--) X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jul 2016 17:26:21 -0000 On 19 Jul 2016, at 17:50, Brian Krusic wrote: > When mounting via CIFS, no lockups occur with the Mellanox. CIFS is = not an option as we are mainly a Linux/OSX house with only NFS. We=92ve = a few Windows machines but they are using NFS. Just to point out... OSX prefers SMB compared to NFS. It used to be the other way around, but OSX Finder has had issues with NFS for years. And Apple decided to go "all-in" with their SMB support. The result these days is although both function "ok", SMB functions better. :) That being said... I've not pushed it hard personally. ;) I just mount my FreeNAS server (which is sharing via CIFS) in OSX using: cifs://servername/sharename In OSX 10.9.5 it's a bit slow to mount (30-40 seconds). For OSX 10.10, it's very fast. Around 5 seconds normally. No idea if that's a helpful thought or not though. ;) Regards and best wishes, Justin Clift -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi