From owner-freebsd-infiniband@freebsd.org Mon Oct 29 03:14:09 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4825C10EDF20; Mon, 29 Oct 2018 03:14:09 +0000 (UTC) (envelope-from avv314@gmail.com) Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A59C57B45F; Mon, 29 Oct 2018 03:14:08 +0000 (UTC) (envelope-from avv314@gmail.com) Received: by mail-lf1-x136.google.com with SMTP id c24-v6so4867904lfi.12; Sun, 28 Oct 2018 20:14:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=ggJzXfVQYDNIJENuH0D/T6zuBu68RltBSOiZnJETWm8=; b=Iy+aGVQfIi6TqC92jmsn79DhIHCWXeRYcjdSO1nx3Pp0kfNQ0IG6mp40+JuTyRMAhk SEGL6hh8WZ8WY9KK8CgjCq5B2XsSTs7Pre48GQBkzSjePY9guQGF++nkiZIyUuwi/G8U LzckYaxwekKYEtoUIq0zf5vLBi69Kqli4ZeMBouV+/hx0grH41f24KusbLEojV1jC3St XG8CUAmgZTPnNFHYLUNs5uyIW0Jb7b+ouQQDqpcUEBwURLL5DnVv9wE/vKuTuZ8TEhmT o/R4QpHGWsfQM9I7AM+gw/1wFQrRPvz7h4KCqwI+76Pzcb1o8Pdg3FMGOt/uMxWELpRT RD8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ggJzXfVQYDNIJENuH0D/T6zuBu68RltBSOiZnJETWm8=; b=ksam3zqF8dYVG0Mf0Lg2FvIITXR9jVenfQJdbbxEz9VnD7jO0viNQ8HlV2emhE27y6 Jzlu7WGM6rrebytG3h5PAwn4ScQHffHBNFU1ylbzVAgimOXYZE5XMlbfvnvBKVitGqm7 5jlJVrT6Q+vlvNkpfu3Us8iU3oJR/usZ5jYYoz8IyIK5/rSSKte6BsDkRVtOb+hKaWaT i4k7TAisimYyddXTL9Zx8q/RY5rd1yeBN7TRb8tPmlZdQ8OIYlJkcxAlywE73UMFie6I K3vi4mnR+kodWjX8w7GSB0UGvikn5gyaeQA/kZ+Vp3q6gCWL/xPHdRZsk9HouFQ+bOLi UUAw== X-Gm-Message-State: AGRZ1gIX1svJBh+U0P9wyrnq8bfbWDyzsEP1Rp6a01F/8mGCU3a2FO7E IN6fxKjOYraCIdUYkXC2Q8ZoVotcsxr/n+W56XLd7ZlZ X-Google-Smtp-Source: AJdET5d6t3DptH6g1eM5Z0kMzKQh2b8B5O0H3UHDoAhIEdxIQPmkKQajYQYvj3ztcV5F7xPTg/tYDWeqJbKdtxoCDY4= X-Received: by 2002:a19:d824:: with SMTP id p36-v6mr6836298lfg.23.1540782846693; Sun, 28 Oct 2018 20:14:06 -0700 (PDT) MIME-Version: 1.0 From: Andrew Vylegzhanin Date: Mon, 29 Oct 2018 06:13:55 +0300 Message-ID: Subject: NFS + Infiniband problem To: freebsd-infiniband@freebsd.org, freebsd-fs@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2018 03:14:09 -0000 Hello everyone, I have a several FreeBSD machines connected via Infiniband netwok ( FDR switch Mellanox SW3036 + ConnectX-3 VPI cards ). One of them is a NAS-server with multiply ZFS pools. All kernels (11.2-RELEASE on clients and 12.0-BETA1 (11.2 also tried) on server) are with infiniband connected mode (option IPOIB_CM, option SDM) and world with OFED stack support. (WITH_OFED='yes'). File transfers via FTP or SSH between server and clients works almost flawless ( ~ 12 Gbit/s ). But when I try to copy in/out some significant data via NFS share mounted on clients, NFS i/o hangs at all or got extremely slow (couple kB/s) transfer speed after uncertain amount of copied data. For example, on the one node I can copy 1GB file, and after NFS hang on file with size 30 kb. Some details: [root@node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2 /mnt [root@node4 ~]# dd if=/dev/zero of=/mnt/N1 bs=1m count=1024 Ctrl-T for "hang" dd load: 0.01 cmd: dd 1061 [bo_wwait] 70.95r 0.00u 0.00s 0% 2112k load: 0.01 cmd: dd 1061 [bo_wwait] 72.89r 0.00u 0.00s 0% 2112k for "slow" dd load: 0.00 cmd: dd 2254 [nfsaio] 224.18r 0.00u 0.13s 0% 3132k load: 0.00 cmd: dd 2254 [nfsaio] 225.94r 0.00u 0.13s 0% 3132k I've tried mount with different wsize option with same result. Any help would be greatly appreciated. -- Andrew From owner-freebsd-infiniband@freebsd.org Mon Oct 29 05:16:15 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6D2CE10F086A; Mon, 29 Oct 2018 05:16:15 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-eopbgr670073.outbound.protection.outlook.com [40.107.67.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 107867E8D8; Mon, 29 Oct 2018 05:16:14 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM (52.132.50.155) by YTOPR0101MB1899.CANPRD01.PROD.OUTLOOK.COM (52.132.49.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1273.25; Mon, 29 Oct 2018 05:16:13 +0000 Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::a086:adbc:2b38:1018]) by YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::a086:adbc:2b38:1018%2]) with mapi id 15.20.1273.027; Mon, 29 Oct 2018 05:16:13 +0000 From: Rick Macklem To: Andrew Vylegzhanin , "freebsd-infiniband@freebsd.org" , "freebsd-fs@freebsd.org" Subject: Re: NFS + Infiniband problem Thread-Topic: NFS + Infiniband problem Thread-Index: AQHUbzV7uxRrUqx4EESM4oETGDUinKU1rGlW Date: Mon, 29 Oct 2018 05:16:13 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YTOPR0101MB1899; 6:jUEUM4mhLPavH9axN7rp7y95BvXxalJNNngx8oNOa02CADyfeFvW8R8TpPHB58sntXIv633B6tzgUcbwBTnnATcieDgi+jwt9r19/EZ8aMwM/YCHgTMpwciVg3Ne83vGoCQRPbFV5PDcBrjMzcvGBkOkNyH7B1aSFUMPtBxttxXxsRiqHr8x9V/J8fu4E+naZ2GnlvFByUR+BfvMfsWjyDEwlzcu9xEURm0OcfulGB5SlJe0mb0eLDU44DbWHMqbZul1v5WlNcVkuG3WmrPWPk4zWkv2pEJHP51/QXgoP584RNxK30bX6mtW2Vyd5Y785+WImKY6Y3Wzak1UpGfgbENYmwcSbwg01nhwynSHAI6433+GGxJ2X/0XdAJ9QwTAdiiitkHrKPUZ04LFN0mg8sLLTRZwNiQLL15I/XofVeDThJmRCq2MKk9JYIWAB47R0naulhLCYi5OdDON5gOAcw==; 5:lbQ8OfRBk9eSLoyx5p/6hOFLxO23Klzh3PfI8vthOAlRKaFaQd0gGTkbmHJ4a7mdF5HDO0PXoOD01jE//tcNPvBOTeQnYLEaUwpbK6Fi468A3/2O566zy+V8AHxhnStzxcq37pujXZx8bqbpJ2iLNCCQ6yqEMKxFgDxqzLeutUI=; 7:ofrX/rnslXHFmTANd/N16XFNGgv2bJLonTqs5Y4oWTOVloJPBCaC9c9lMy9Ecn2SnF6IOKcaqio+ZTjo4xez9s9It8740A47YtRSpar9M8gZkW1NsYHWFIuSahLD3IIqLDyKEBAMwdV0XdStpOpXHQ== x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 6bb000ad-79ed-4330-ddce-08d63d5da556 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(2017052603328)(7153060)(7193020); SRVR:YTOPR0101MB1899; x-ms-traffictypediagnostic: YTOPR0101MB1899: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(5005006)(8121501046)(3002001)(10201501046)(93006095)(93001095)(3231382)(944501410)(52105095)(148016)(149066)(150057)(6041310)(20161123562045)(20161123564045)(20161123560045)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(201708071742011)(7699051)(76991095); SRVR:YTOPR0101MB1899; BCL:0; PCL:0; RULEID:; SRVR:YTOPR0101MB1899; x-forefront-prvs: 084080FC15 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(346002)(376002)(136003)(39860400002)(366004)(189003)(199004)(39060400002)(74482002)(186003)(316002)(105586002)(33656002)(256004)(5250100002)(2501003)(102836004)(55016002)(25786009)(786003)(7696005)(76176011)(14454004)(110136005)(106356001)(6506007)(81166006)(6436002)(68736007)(2906002)(81156014)(2900100001)(53936002)(9686003)(99286004)(486006)(6246003)(97736004)(71190400001)(74316002)(86362001)(46003)(2201001)(8676002)(476003)(229853002)(5660300001)(11346002)(478600001)(446003)(8936002)(305945005)(71200400001); DIR:OUT; SFP:1101; SCL:1; SRVR:YTOPR0101MB1899; H:YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: 8ip4rNKt1HbRXthqs8r9tA9MoSfoZoNnIfZ100IiAD/7WNeG+f3nLAXB+DSD5ypoGkqPWt8LqMu3pESV3eqKZTQG44PsGTKwQcAPlrzbRIis0ajWNGjqvlQy1l2riHWC3bCx5srQ9sp7H21xfBZr8PukXmljRTJLcjbi54CMkfPR6kHrbKjN3c6m3LgVvsuETxMwy9A6jUifeAwuLzABvcOhjnnKi2OfpABi+gfEq0NEvAj0KR0QIprg8Y4ufk2M1E++WRq84zCaxz/rpwsstUm2oBrrSfgBAvHDVjlJDHKdLcYqiiWoVO/NSZFPTsEa8dj0oa6wsUFgm7Ni3i86+skpCtx+YDuk9wCA1wAico4= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 6bb000ad-79ed-4330-ddce-08d63d5da556 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Oct 2018 05:16:13.4065 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR0101MB1899 X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2018 05:16:15 -0000 [stuff related to slow performance snipped] Try disabling TSO, that is the most common cause of NFS RPC transport issues. # sysctl net.inet.tcp.tso=3D0 will do it, if you can't do it for the interface. (Also, try disabling LSO, LRO if the interface will let you do so.) You can also try mounting with "rsize=3D8192,wsize=3D8192" and if that work= s fairly well, then just keep doubling it until it no longer works well. I know nothing about InfiniBand, so if the above doesn't help, hopefully someone familiar with InfiniBand can help. Good luck with it, rick From owner-freebsd-infiniband@freebsd.org Mon Oct 29 15:06:38 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1308D10DCBC0; Mon, 29 Oct 2018 15:06:38 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5E4F474B2F; Mon, 29 Oct 2018 15:06:37 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (localhost [127.0.0.1]) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3) with ESMTP id w9TF6Yve057203; Mon, 29 Oct 2018 08:06:34 -0700 (PDT) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: (from freebsd-rwg@localhost) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3/Submit) id w9TF6YAP057202; Mon, 29 Oct 2018 08:06:34 -0700 (PDT) (envelope-from freebsd-rwg) From: "Rodney W. Grimes" Message-Id: <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net> Subject: Re: NFS + Infiniband problem In-Reply-To: To: Andrew Vylegzhanin Date: Mon, 29 Oct 2018 08:06:34 -0700 (PDT) CC: freebsd-infiniband@freebsd.org, freebsd-fs@freebsd.org X-Mailer: ELM [version 2.4ME+ PL121h (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2018 15:06:38 -0000 > Hello everyone, > > I have a several FreeBSD machines connected via Infiniband netwok ( FDR > switch Mellanox SW3036 + ConnectX-3 VPI cards ). > One of them is a NAS-server with multiply ZFS pools. > > All kernels (11.2-RELEASE on clients and 12.0-BETA1 (11.2 also tried) on > server) are with infiniband connected mode (option IPOIB_CM, option SDM) > and world with OFED stack support. (WITH_OFED='yes'). > > File transfers via FTP or SSH between server and clients works almost > flawless ( ~ 12 Gbit/s ). > > But when I try to copy in/out some significant data via NFS share mounted > on clients, NFS i/o hangs at all or got extremely slow (couple kB/s) > transfer speed after uncertain amount of copied data. For example, on the > one node I can copy 1GB file, and after NFS hang on file with size 30 kb. > > Some details: > [root@node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2 /mnt ^^^^^^^^^^^^ I am not sure what the interaction between page sizes, TSO needs, buffer needs and all that are but I always use a power of 2 wsize and rsize. You might try that. And as Rick suggested, turn of TSO, if you can. Is infiniband using RDMA to do this, if so then the page size stuff is probably very important, use multiples of 4096 only. > [root@node4 ~]# dd if=/dev/zero of=/mnt/N1 bs=1m count=1024 > > Ctrl-T for "hang" dd > load: 0.01 cmd: dd 1061 [bo_wwait] 70.95r 0.00u 0.00s 0% 2112k > load: 0.01 cmd: dd 1061 [bo_wwait] 72.89r 0.00u 0.00s 0% 2112k > > for "slow" dd > load: 0.00 cmd: dd 2254 [nfsaio] 224.18r 0.00u 0.13s 0% 3132k > > load: 0.00 cmd: dd 2254 [nfsaio] 225.94r 0.00u 0.13s 0% 3132k > > I've tried mount with different wsize option with same result. > > Any help would be greatly appreciated. > > -- > Andrew > _______________________________________________ > freebsd-infiniband@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-infiniband > To unsubscribe, send any mail to "freebsd-infiniband-unsubscribe@freebsd.org" > -- Rod Grimes rgrimes@freebsd.org From owner-freebsd-infiniband@freebsd.org Mon Oct 29 15:25:10 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C3D610DD41C; Mon, 29 Oct 2018 15:25:10 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-eopbgr670069.outbound.protection.outlook.com [40.107.67.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1F0B17562D; Mon, 29 Oct 2018 15:25:09 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM (52.132.50.155) by YTOPR0101MB1292.CANPRD01.PROD.OUTLOOK.COM (52.132.45.19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1273.19; Mon, 29 Oct 2018 15:25:08 +0000 Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::a086:adbc:2b38:1018]) by YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::a086:adbc:2b38:1018%2]) with mapi id 15.20.1273.027; Mon, 29 Oct 2018 15:25:08 +0000 From: Rick Macklem To: "Rodney W. Grimes" , Andrew Vylegzhanin CC: "freebsd-fs@freebsd.org" , "freebsd-infiniband@freebsd.org" Subject: Re: NFS + Infiniband problem Thread-Topic: NFS + Infiniband problem Thread-Index: AQHUbzV7uxRrUqx4EESM4oETGDUinKU2U4sAgAACL+E= Date: Mon, 29 Oct 2018 15:25:07 +0000 Message-ID: References: , <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net> In-Reply-To: <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YTOPR0101MB1292; 6:RxpnYDQZaTkULafKJeSfxynRKRi18R4LdrtG6M8KbTGwcjwznVKtgLA6geazT7TB9GwHfNwisr64fF/rUv6Sp85988vOP4jF3OTbXAUrgjRMeHjpjVZX9j+F9w5kfahR+uRR6SpLdSa1hQv0KVFq5UooZgvsXA6seMebxqypW5l/eJVGDGayk0pOZpCjJHUixq7npEPs3bpivcAECd3MxkEimfflMYIAd4gOdjXPXRw0oFYaUq7zJZzf1JWA3jpljE9pA1YgHm4li56xA1q4+FKyWjsCRGXFgG33mGZXsoy4XfTi+mbtHxBbcWHPpFIrvg8gvcT6QHzuCqQW8npV9t8oK//AFQ774/ao/pzRdbKwhO0nlszG7kUyJ/fdI9F5PPvfMu6Hr/oM2jufZZZ+GZsL9glkGnOVEOG6l/g5Ah6hDXAFDN+CDuddiIOU5n9LIv+c0/unvrtqO1n34TTrVA==; 5:cKuvU5xfYmVdoWHNiVJIsrPj+cNk1mXdUb/qveOOQqr55yePm/CPr44yFvr098JHNqcdT21FugAdn4bNn8FAE23pEFMesxu96TdS2G4HnAK2/0XXyNLyQLQyej/mmAdZAXqsjC3vZUMv50hy4KoG6J7rT68wAEWo4mf7zyIahXg=; 7:oroQrwDBAm7lt+ay6FChmlcoLMH3+mcWf/4dQeWCEURpIigvZJ9X64fq8GNBBEUgVSSCw37RH4mopv2QqQOwdXSV4zslBrYhr29mmvqycR6nejpyxWQ0Nq1Owbk5rv0LThor+rNmj/hDdA2JbRbPIg== x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: d17498b0-73db-4aea-b1e3-08d63db2b5a5 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(5600074)(711020)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020); SRVR:YTOPR0101MB1292; x-ms-traffictypediagnostic: YTOPR0101MB1292: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(8121501046)(5005006)(10201501046)(3002001)(93006095)(93001095)(3231382)(944501410)(52105095)(148016)(149066)(150057)(6041310)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123564045)(20161123560045)(20161123558120)(201708071742011)(7699051)(76991095); SRVR:YTOPR0101MB1292; BCL:0; PCL:0; RULEID:; SRVR:YTOPR0101MB1292; x-forefront-prvs: 084080FC15 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(39860400002)(396003)(136003)(346002)(376002)(199004)(53754006)(189003)(6436002)(256004)(9686003)(25786009)(2906002)(55016002)(186003)(33656002)(4326008)(76176011)(5250100002)(39060400002)(786003)(316002)(86362001)(99286004)(7696005)(14444005)(478600001)(102836004)(105586002)(6506007)(74482002)(14454004)(106356001)(110136005)(54906003)(446003)(74316002)(6246003)(11346002)(486006)(476003)(5660300001)(71200400001)(71190400001)(8676002)(81156014)(53936002)(46003)(81166006)(8936002)(68736007)(2900100001)(229853002)(97736004)(305945005); DIR:OUT; SFP:1101; SCL:1; SRVR:YTOPR0101MB1292; H:YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: 9kllnFSps1+i/1WMcU5oJUHuXtIycfoHCJ2JUL1i8kIG4SaBywHJj3jlcKldarfC4It5UK0LT6fLPcoUiYmjKXDb9BXCjl2H6mcNpLpcqyZ11aaGZ0nVDmRuRLsgWhQOWE7Xbg6OT0DVtkuMzkVp0mEC1W+FkdbP6UjtAlLbbaI+ksojBBb2gKfIWI4XY7dIVT1Pc2QUwEmpJBtRuluZv0hX+AB05enHgoj+PfRDjYKhlgUE56wI868rsjzjBPrZMKrHaPyMm8DDOAw7pd8VeDyS7Ys25IbBl0D5oVdhGdN8IppN3i9ngmyT7wyttpunBHW7Uh9PJZdeptJEV70dyK1lfd1r+HNpighJvifF/vc= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: d17498b0-73db-4aea-b1e3-08d63db2b5a5 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Oct 2018 15:25:07.9886 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR0101MB1292 X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2018 15:25:10 -0000 Rodney W. Grimes wrote: Andrew Vylegzhanin wrote: >> Hello everyone, >> >> I have a several FreeBSD machines connected via Infiniband netwok ( FDR >> switch Mellanox SW3036 + ConnectX-3 VPI cards ). >> One of them is a NAS-server with multiply ZFS pools. >> >> All kernels (11.2-RELEASE on clients and 12.0-BETA1 (11.2 also tried) on >> server) are with infiniband connected mode (option IPOIB_CM, option SDM) >> and world with OFED stack support. (WITH_OFED=3D'yes'). >> >> File transfers via FTP or SSH between server and clients works almost >> flawless ( ~ 12 Gbit/s ). >> >> But when I try to copy in/out some significant data via NFS share mounte= d >> on clients, NFS i/o hangs at all or got extremely slow (couple kB/s) >> transfer speed after uncertain amount of copied data. For example, on th= e >> one node I can copy 1GB file, and after NFS hang on file with size 30 kb= . >> >> Some details: >> [root@node4 ~]# mount_nfs -o wsize=3D30000 -o proto=3Dtcp 10.0.2.1:/zdat= a2 /mnt > ^^^^^^^^^^^^ >I am not sure what the interaction between page sizes, TSO needs, >buffer needs and all that are but I always use a power of 2 wsize >and rsize. They should always be a power of 2. I think the code clips the value, but i= t might only clip to a multiple of 512. If it didn't clip this down to 16384, then = that would definitely be a problem. Also, normally the same size for rsize and w= size is used. If you don't do that, you end up with weird sided blocks in the bu= ffer cache. I think it still works when this is done, but could cause performance hits. Probably doesn't matter for a simple performance test. (You can find out what options it is actually using by typing "nfsstat -m" = after doing the mount.) > You might try that. And as Rick suggested, turn of >TSO, if you can. Is infiniband using RDMA to do this, if so then >the page size stuff is probably very important, use multiples of >4096 only. RDMA is not supported by the FreeBSD NFS client. There is a way to use RDMA on a separate connection with NFSv4.1 or later, but I've never written code for that. (Not practical to try to implement without access to hardware tha= t does it.) rick [performance stuff snipped] From owner-freebsd-infiniband@freebsd.org Mon Oct 29 15:48:50 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7381E10DDAAA; Mon, 29 Oct 2018 15:48:50 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AEBE275FFA; Mon, 29 Oct 2018 15:48:49 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (localhost [127.0.0.1]) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3) with ESMTP id w9TFmltT057418; Mon, 29 Oct 2018 08:48:47 -0700 (PDT) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: (from freebsd-rwg@localhost) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3/Submit) id w9TFmjaD057417; Mon, 29 Oct 2018 08:48:45 -0700 (PDT) (envelope-from freebsd-rwg) From: "Rodney W. Grimes" Message-Id: <201810291548.w9TFmjaD057417@pdx.rh.CN85.dnsmgr.net> Subject: Re: NFS + Infiniband problem In-Reply-To: To: Rick Macklem Date: Mon, 29 Oct 2018 08:48:45 -0700 (PDT) CC: Andrew Vylegzhanin , "freebsd-fs@freebsd.org" , "freebsd-infiniband@freebsd.org" X-Mailer: ELM [version 2.4ME+ PL121h (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2018 15:48:50 -0000 > Rodney W. Grimes wrote: > Andrew Vylegzhanin wrote: > >> Hello everyone, > >> > >> I have a several FreeBSD machines connected via Infiniband netwok ( FDR > >> switch Mellanox SW3036 + ConnectX-3 VPI cards ). > >> One of them is a NAS-server with multiply ZFS pools. > >> > >> All kernels (11.2-RELEASE on clients and 12.0-BETA1 (11.2 also tried) on > >> server) are with infiniband connected mode (option IPOIB_CM, option SDM) > >> and world with OFED stack support. (WITH_OFED='yes'). > >> > >> File transfers via FTP or SSH between server and clients works almost > >> flawless ( ~ 12 Gbit/s ). > >> > >> But when I try to copy in/out some significant data via NFS share mounted > >> on clients, NFS i/o hangs at all or got extremely slow (couple kB/s) > >> transfer speed after uncertain amount of copied data. For example, on the > >> one node I can copy 1GB file, and after NFS hang on file with size 30 kb. > >> > >> Some details: > >> [root@node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2 /mnt > > ^^^^^^^^^^^^ > >I am not sure what the interaction between page sizes, TSO needs, > >buffer needs and all that are but I always use a power of 2 wsize > >and rsize. > They should always be a power of 2. I think the code clips the value, but it might > only clip to a multiple of 512. If it didn't clip this down to 16384, then that > would definitely be a problem. Also, normally the same size for rsize and wsize > is used. If you don't do that, you end up with weird sided blocks in the buffer cache. > I think it still works when this is done, but could cause performance hits. > Probably doesn't matter for a simple performance test. > (You can find out what options it is actually using by typing "nfsstat -m" after doing the mount.) > > > You might try that. And as Rick suggested, turn of > >TSO, if you can. Is infiniband using RDMA to do this, if so then > >the page size stuff is probably very important, use multiples of > >4096 only. > RDMA is not supported by the FreeBSD NFS client. There is a way to use RDMA > on a separate connection with NFSv4.1 or later, but I've never written code > for that. (Not practical to try to implement without access to hardware that > does it.) It would be very easy to arrange for a pair of PCIE 10G IB cards and a cable to go betweem them if that would be of use to you in some day doing some of this work, or for that mater for even playing with NFS over IB. > rick -- Rod Grimes rgrimes@freebsd.org From owner-freebsd-infiniband@freebsd.org Tue Oct 30 04:57:35 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3EF7010D4199; Tue, 30 Oct 2018 04:57:35 +0000 (UTC) (envelope-from avv314@gmail.com) Received: from mail-lj1-x233.google.com (mail-lj1-x233.google.com [IPv6:2a00:1450:4864:20::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 992F1723D6; Tue, 30 Oct 2018 04:57:34 +0000 (UTC) (envelope-from avv314@gmail.com) Received: by mail-lj1-x233.google.com with SMTP id z21-v6so10067513ljz.0; Mon, 29 Oct 2018 21:57:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cdbpjA3mf+ego3EpdU6NFImyQyexxEKRIDIWRjFwTBY=; b=u7lGBwR80m5mLYZcJdoklg96XCV3mXABRD0VxpNcnnRxiglPdDblXA5T9i5FpOILLr SdI07AaUNQ7AIs9H+nD0QHu1N+OJfYybPg9kahiMB+HgudUPplv0sxdKF9Uj08bZpoj8 NKTmMDp3WGL0SSKbuMANXNMlkzTIkBFMo9t+F1P4i/wPpXPCjy9f7QwHe8PGW1UsPW6E PlykQpTK8ZD/NiDnF7wasK+qCCcKcaYI1CrcA6Q2IX5qje1bDdcueUWoLNjAbpihu0ql hvnGnPhzdMRKuNWbsVNjCqE2NAxs1GrcdEpm3AnVfkTpz1tKUI0JMRxxD2AmxhsVIv4d YP1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cdbpjA3mf+ego3EpdU6NFImyQyexxEKRIDIWRjFwTBY=; b=LQOBsRC2fLOs0PSAdhJhEVOsCAAaM7BJMbNuzqaqH5QwNNz6M4tItlb/RWfQ61Jm4/ oD7IigRcRky21dA1NblNz3mPInDZ+F59772Hk+Ia0BVDT+xqGOaEaY4HXqfjwvs6nDFR xou8kKKGz4iwcPN+QMYXvi6XQnsFUqdM1wd5vZUX24GA0TPwSPPrTMPW0lFFUj3Ca3nG jEI6Af4p5hJ41V3fxi6FD6v0BjoVksmbHsau2wJX3snfwgZU7VFpV0seoGygHKzESE/o +8RFE7LXZuDUMp0uhtvAz9ppJRj3g8RVi771hCL5Kh2vFP8BcAaHF8XbYToqbYID1YyM Zgtg== X-Gm-Message-State: AGRZ1gIza7IjnxHbov+gHk1Td1RpVF+LkVGvnbdtrxqqZQYZQiVSt+T3 kSWnZqj+JZWc3d5ommluKHRptjVcYLAVPA83gfE0ea2V X-Google-Smtp-Source: AJdET5dtuDJuv0TEkMpcPkfAcnCjawYCi0lJiOK1UccPXTEVkMHbAUu0pKxvp+HpcS6Tgi7DKqj42LjY/l1cQXvvSBY= X-Received: by 2002:a2e:478f:: with SMTP id u137-v6mr395322lja.142.1540875452870; Mon, 29 Oct 2018 21:57:32 -0700 (PDT) MIME-Version: 1.0 References: <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net> In-Reply-To: From: Andrew Vylegzhanin Date: Tue, 30 Oct 2018 07:57:21 +0300 Message-ID: Subject: Re: NFS + Infiniband problem To: rmacklem@uoguelph.ca Cc: "Rodney W. Grimes" , freebsd-fs@freebsd.org, freebsd-infiniband@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2018 04:57:35 -0000 > >> Some details: > >> [root@node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2 /mnt > > ^^^^^^^^^^^^ > >I am not sure what the interaction between page sizes, TSO needs, > >buffer needs and all that are but I always use a power of 2 wsize > >and rsize. Again after some tests. I've tried 4096,8192,16384 wsize/rsize. Only 4096 value give some measurable result and it's extremely slow ~ 10-16MB/s for writing (depend on number of threads), 10-12 MB/s for reading. With other values NFS hangs (or almost hangs - couple kB/s in average) Changing sysctl net.inet.tcp.tso=0 (w/o reboot) on both sides had no effect. AFAIK, infiniband interface has no option for TSO,LRO: ib0: flags=8043 metric 0 mtu 65520 options=80018 lladdr 80.0.2.8.fe.80.0.0.0.0.0.0.e4.1d.2d.3.0.50.df.51 inet 10.0.2.1 netmask 0xffffff00 broadcast 10.0.2.255 nd6 options=29 BTW, hers is my sysctl.conf file with optimisation for congestion control and tcp buffers on 10/40 Gbit/s links (the server had 40 Gbit/s Intel ixl ethernet also): kern.ipc.maxsockbuf=16777216 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendbuf_auto=1 net.inet.tcp.recvbuf_auto=1 net.inet.tcp.sendbuf_inc=16384 net.inet.tcp.recvbuf_inc=524288 net.inet.tcp.cc.algorithm=htcp > > You might try that. And as Rick suggested, turn of > >TSO, if you can. Is infiniband using RDMA to do this, if so then > >the page size stuff is probably very important, use multiples of > >4096 only. > RDMA is not supported by the FreeBSD NFS client. There is a way to use RDMA > on a separate connection with NFSv4.1 or later, but I've never written code > for that. (Not practical to try to implement without access to hardware that > does it.) > Hope I could help with testing. -- Andrew > rick > [performance stuff snipped] From owner-freebsd-infiniband@freebsd.org Tue Oct 30 18:17:16 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2F9B110F0662 for ; Tue, 30 Oct 2018 18:17:16 +0000 (UTC) (envelope-from bacon4000@gmail.com) Received: from mail-it1-x12e.google.com (mail-it1-x12e.google.com [IPv6:2607:f8b0:4864:20::12e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EC11C72BC8 for ; Tue, 30 Oct 2018 18:17:14 +0000 (UTC) (envelope-from bacon4000@gmail.com) Received: by mail-it1-x12e.google.com with SMTP id e17so13272931itk.5 for ; Tue, 30 Oct 2018 11:17:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=36Fvz+OxSOpfBDMA2PJ/KrhhKgmpDvGQYTqdfArt2Pg=; b=Gv6T6eXw37IobBKr0f4TtNUWaWB4oyP8DBuzRaUQ/ON7zl2KowAYWZR5kaIy6VZVEx HGwcN+kRPzfs12Xi1zNb1fA5uAtF20ZaECSJA9JDTeASxytKeZWGb20Z/D4KuIh9OYF8 /FhcveIi7uGWw8w6scpwwFI6koBfsu5zGtYqDcqfYEMlJKma++OlugtLglBArTgCrKek SR6XfijpkCOMI/I0TnPvhffHLrjoAwbHRaRcgTQ/o7lstlUxRSTOEmJlA1spWRCX/70T ijfyFLoYLkTzvGL0xyEjHFQlmYTHq2xo9S4o+gglFkfAPvD1z9TwRpGGx0KN/ISmWLSA 9owA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=36Fvz+OxSOpfBDMA2PJ/KrhhKgmpDvGQYTqdfArt2Pg=; b=RswFRcJ3/QPl+CuQiojN4XUgT/jgkaWdsNO2GF7Nymn6hKl4wCcZeQ1wIgIYla/Lkl Znm97HAcL2KMLAP5K2tPQb4rh9yidmLr2KPL+2uTzPxaJUI3fkKzflB21pTgaMA0nYVH EhykhZpKj+6eqhJVFfZcRpBiU7gZOmYaqksUMlPp67UX3SY8K2NFVFrd9gkbLpPpuokk NywKKr/Nw6i+ydAVbaC844yWtbtovSI2a4ukmI9ov5/37NCPblsQSTVNTazaMHcazAZv uCSmcI5jgQcu2EIzO5v30vdVgVgjMrfEmCxA4t+0e1/7fVtsoyD87xyOCPz5Coy+neRA Q9Iw== X-Gm-Message-State: AGRZ1gLHc6pOg85w/kfg8itavQACpLJRxOqruUvIIjsgg62Wa/sgdagM OctrdKVi3b77aRsgoX1/K0umoNwF X-Google-Smtp-Source: AJdET5cvMCX0KCgUvrAt3gpYArSKYWFm8rQ6aYfSJKr9ISDWeNWdXE8gFDJ6N2ajhT3r9trmhDI1NA== X-Received: by 2002:a02:1e5c:: with SMTP id m89-v6mr13160903jad.124.1540923430281; Tue, 30 Oct 2018 11:17:10 -0700 (PDT) Received: from cray.acadix.biz (cpe-174-102-163-140.wi.res.rr.com. [174.102.163.140]) by smtp.gmail.com with ESMTPSA id y15-v6sm8525569iof.58.2018.10.30.11.17.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Oct 2018 11:17:08 -0700 (PDT) Subject: Re: NFS + Infiniband problem To: freebsd-infiniband@freebsd.org References: <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net> From: Jason Bacon Message-ID: <7448666e-6028-12e7-aa74-4fd162e13dc7@gmail.com> Date: Tue, 30 Oct 2018 13:17:07 -0500 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Oct 2018 18:17:16 -0000 On 10/29/18 11:57 PM, Andrew Vylegzhanin wrote: >>>> Some details: >>>> [root@node4 ~]# mount_nfs -o wsize=30000 -o proto=tcp 10.0.2.1:/zdata2 > /mnt >>> ^^^^^^^^^^^^ >>> I am not sure what the interaction between page sizes, TSO needs, >>> buffer needs and all that are but I always use a power of 2 wsize >>> and rsize. > Again after some tests. > I've tried 4096,8192,16384 wsize/rsize. Only 4096 value give some > measurable result and it's extremely slow ~ 10-16MB/s for writing (depend > on number of threads), 10-12 MB/s for reading. With other values NFS hangs > (or almost hangs - couple kB/s in average) > > Changing sysctl net.inet.tcp.tso=0 (w/o reboot) on both sides had no > effect. > > AFAIK, infiniband interface has no option for TSO,LRO: > ib0: flags=8043 metric 0 mtu 65520 > options=80018 > lladdr 80.0.2.8.fe.80.0.0.0.0.0.0.e4.1d.2d.3.0.50.df.51 > inet 10.0.2.1 netmask 0xffffff00 broadcast 10.0.2.255 > nd6 options=29 > > > BTW, hers is my sysctl.conf file with optimisation for congestion control > and tcp buffers on 10/40 Gbit/s links (the server had 40 Gbit/s Intel ixl > ethernet also): > > kern.ipc.maxsockbuf=16777216 > > net.inet.tcp.sendbuf_max=16777216 > > net.inet.tcp.recvbuf_max=16777216 > > net.inet.tcp.sendbuf_auto=1 > > net.inet.tcp.recvbuf_auto=1 > > net.inet.tcp.sendbuf_inc=16384 > > net.inet.tcp.recvbuf_inc=524288 > > net.inet.tcp.cc.algorithm=htcp > > >>> You might try that. And as Rick suggested, turn of >>> TSO, if you can. Is infiniband using RDMA to do this, if so then >>> the page size stuff is probably very important, use multiples of >>> 4096 only. >> RDMA is not supported by the FreeBSD NFS client. There is a way to use > RDMA >> on a separate connection with NFSv4.1 or later, but I've never written > code >> for that. (Not practical to try to implement without access to hardware > that >> does it.) >> > Hope I could help with testing. > > -- > Andrew > >> rick >> [performance stuff snipped] > _______________________________________________ > freebsd-infiniband@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-infiniband > To unsubscribe, send any mail to "freebsd-infiniband-unsubscribe@freebsd.org" Did you try NFS over a standard gigabit interface just to rule out any issues that are independent of IB?  It's odd that other protocols seem to work well.     Jason -- Earth is a beta site. From owner-freebsd-infiniband@freebsd.org Wed Oct 31 02:53:51 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 680F31074DAA; Wed, 31 Oct 2018 02:53:51 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-TO1-obe.outbound.protection.outlook.com (mail-eopbgr670065.outbound.protection.outlook.com [40.107.67.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 07CAD8714D; Wed, 31 Oct 2018 02:53:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM (52.132.50.155) by YTOPR0101MB2092.CANPRD01.PROD.OUTLOOK.COM (52.132.46.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1294.20; Wed, 31 Oct 2018 02:53:49 +0000 Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::9c71:6eb6:1bff:727b]) by YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::9c71:6eb6:1bff:727b%3]) with mapi id 15.20.1294.021; Wed, 31 Oct 2018 02:53:49 +0000 From: Rick Macklem To: Andrew Vylegzhanin CC: "Rodney W. Grimes" , "freebsd-fs@freebsd.org" , "freebsd-infiniband@freebsd.org" Subject: Re: NFS + Infiniband problem Thread-Topic: NFS + Infiniband problem Thread-Index: AQHUbzV7uxRrUqx4EESM4oETGDUinKU2U4sAgAACL+GAAOXvgIABagv/ Date: Wed, 31 Oct 2018 02:53:49 +0000 Message-ID: References: <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net> , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YTOPR0101MB2092; 6:yeTLYMXkIIjIk/LnVtMCbg0A9ibdqPAbmt55FH9lhCiSZBNU2RVko/odXF9gzdjNTtZNjNlRg/HlxjRZoJivF89eiSqUhsQNShSYc5M0ti/4FoPnKwvK6+lEWFBT3yzFmWKmLGbCaeustMQ88IvR+Ahqxe6aHsi3B9N1RCSTQ+iyFLFyRDtYlT/eg4YT3m0rhISozv8fGbA492FAmfoPQfyGz07/VDIKNb4ebm5o8GrSvxwqKt9tpL2T2SvPAtNANUfSCAvg95/o8r9UejDvk+yf8+7izSl/rIyjWbnUVEZ8yn3g4MYMwIBcVrCk9ZIKSjt2vOoovhLaZyAFZP8gyKK7vEi/pU47dfFQc0eip75mLA+iGt1nLx+ExKHko5qdiobun8G6zL/TUqr6RxMr1YGxTqofnpUqiHzc9zDz0whGCp6e0G1bl7xFohgo65g/qcRnoSc6W/XdsWOQNf/IDQ==; 5:qdUaEUvVU2NEtkt9O6Sgu4+J13lf79Ofbkw8y27OrTPwR0X5n5k4Su9s2cVdqRgnX2imTlD0Zy3oz+q0vxQl8xMVMlMlm95OyLpxeBRJwKNQxN0LVx9yxiLvNP/DN/UEz4vvbTSTzlO26w6augt8Ql08/pzPM1tcvRKFZvooBq4=; 7:TgzUUP7tBG3vurSey9tk1gV7pOKVcIb9cjJ2H+CgZW6dk3azzGqTwxh71/JYmDrBzj8ZwHf/JC28od66HJqadymWksNR/ofPoKU3hyfwy9gSTotH4k/1Oz4rSnagXV3aXmnM5X/fuF4Mfxeo5aKSrQ== x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 9cc5da94-a157-46ef-f0e9-08d63edc1595 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(2017052603328)(7153060)(7193020); SRVR:YTOPR0101MB2092; x-ms-traffictypediagnostic: YTOPR0101MB2092: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(5005006)(8121501046)(3231382)(944501410)(52105095)(3002001)(10201501046)(93006095)(93001095)(148016)(149066)(150057)(6041310)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(20161123560045)(20161123564045)(20161123558120)(201708071742011)(7699051)(76991095); SRVR:YTOPR0101MB2092; BCL:0; PCL:0; RULEID:; SRVR:YTOPR0101MB2092; x-forefront-prvs: 084285FC5C x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39860400002)(136003)(396003)(366004)(376002)(346002)(189003)(199004)(11346002)(2906002)(316002)(71190400001)(71200400001)(6916009)(54906003)(786003)(68736007)(1411001)(6506007)(76176011)(33656002)(102836004)(93886005)(14454004)(81166006)(81156014)(105586002)(8936002)(7696005)(25786009)(8676002)(6436002)(53936002)(6246003)(229853002)(106356001)(2900100001)(55016002)(5660300001)(99286004)(256004)(9686003)(478600001)(74482002)(97736004)(476003)(46003)(86362001)(74316002)(305945005)(446003)(186003)(4326008)(486006)(39060400002)(5250100002); DIR:OUT; SFP:1101; SCL:1; SRVR:YTOPR0101MB2092; H:YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: 6mNL7yOraiA6rHG7oMhNhIaTgdm0cK3BjuBIJQgt9EQmyqcEE1Kg1p8TPcu8+FJz/GE7Cdx85c0psADAnwAL0yhR1eXJ/SYQQWYfX+HfD/DGM7Zgn4xYELQDdPZwOoJqq2+mUUEf2EpVpAwE1jtR27MbmMtQbdGjcxx3SNnVQOZDSVabAzG3HG9zYBvNZQe5DTO2MxOf/mT2TiuH5+cbwQLGWT9xxIR4hrAel38SLrMGM5BjiV6vjwXUTApuhHemV+sAVpmIeONPb+FtwiQ+GfvHjHHyR60VLFHb16vIaAiQFiqOkddvNA+ExnR+Nt5Ct3sRHzJQiMZVyJEsjzpZtQ2PPhbcTU1b9/nzWxBzA8I= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 9cc5da94-a157-46ef-f0e9-08d63edc1595 X-MS-Exchange-CrossTenant-originalarrivaltime: 31 Oct 2018 02:53:49.5066 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR0101MB2092 X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2018 02:53:51 -0000 Andrew Vylegzhanin wrote: >> >> Some details: >> >> [root@node4 ~]# mount_nfs -o wsize=3D30000 -o proto=3Dtcp 10.0.2.1:/z= data2 /mnt >> > ^^^^^^^^^^^^ > > >Again after some tests. >I've tried 4096,8192,16384 wsize/rsize. Only 4096 value give some measurab= le >result and it's extremely slow ~ 10-16MB/s for writing (depend on numbe= r of >threads), 10-12 MB/s for reading. With other values NFS hangs (or alm= ost hangs - >couple kB/s in average) > >Changing sysctl net.inet.tcp.tso=3D0 (w/o reboot) on both sides had no ef= fect. > >AFAIK, infiniband interface has no option for TSO,LRO: >ib0: flags=3D8043 metric 0 mtu 65520 > options=3D80018 > lladdr 80.0.2.8.fe.80.0.0.0.0.0.0.e4.1d.2d.3.0.50.df.51 > inet 10.0.2.1 netmask 0xffffff00 broadcast 10.0.2.255 > nd6 options=3D29 > > >BTW, hers is my sysctl.conf file with optimisation for congestion control = and tcp >buffers on 10/40 Gbit/s links (the server had 40 Gbit/s Intel ixl = ethernet also): > >kern.ipc.maxsockbuf=3D16777216 > >net.inet.tcp.sendbuf_max=3D16777216 > >net.inet.tcp.recvbuf_max=3D16777216 > >net.inet.tcp.sendbuf_auto=3D1 > >net.inet.tcp.recvbuf_auto=3D1 > >net.inet.tcp.sendbuf_inc=3D16384 > >net.inet.tcp.recvbuf_inc=3D524288 > >net.inet.tcp.cc.algorithm=3Dhtcp Well, I'm not familiar with the current TCP stack (and, as noted before, I = know nothing about InfiniBand). All I can suggest is testing with the default co= ngestion control algorithm. (I always test with the default, which appears to be new= reno.) NFS traffic looks very different than a typical use of TCP. Lots of small T= CP segments in both directions interspersed with some larger ones (the write requests or read replies). rick From owner-freebsd-infiniband@freebsd.org Wed Oct 31 20:02:39 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6A88B10E848B; Wed, 31 Oct 2018 20:02:39 +0000 (UTC) (envelope-from avv314@gmail.com) Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DCE5E86C15; Wed, 31 Oct 2018 20:02:38 +0000 (UTC) (envelope-from avv314@gmail.com) Received: by mail-lf1-x133.google.com with SMTP id p17so6164919lfh.4; Wed, 31 Oct 2018 13:02:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=m8Rgzg4293v3MVlEqea2Z8/+X11CqprUEMKGGxpMf7M=; b=rXmPAuR4J2WinO4QN0stdQXBuu2dw6s30eyTwy1rfW/s0AslJk3T/9PJWzjltTV/us eD21hKF6ddUsYfQJxPc4e2PjOoIuzzewSZKe1GRToUrtj5jqNKzE2/6NNvOJxfxTppsN 1J6QPbKPxuLMiT2r8moT20qfCsiwGx9OjhRaGsomhrpFWqdXcIXYbEuKylyPiT/8pkQh eE5CNS0DuXRKZ2b43bXr5y+GjYtnCo7HtndfTGeIUlxcyqhRePmp/z5AS+H/FKKz/lGD Uyi7MzMPZqfdlzJVbYuR03Szvo2aikkbQuuEvQE25s89pilOVSe2SpzKK34GgIyn6fjV qVaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=m8Rgzg4293v3MVlEqea2Z8/+X11CqprUEMKGGxpMf7M=; b=PZHjdVA1FRZDKFyZS+GU9Cbco3DwHK7VvEgcB18HDcwANbVionT85CadCm5yusc92n i7ZYB52VusgPSYEi46jMbHL1+k5zobc8zMENUEPOAPjDCJA8iy9G2R31GU4iOxKDMzn4 00BBR+VW1CD53GOZk5nFbF8XclQPopxEKCOHQyU5Fr3wJLYC1qF/KOJrOQYTtMT3bMum 7vnDuGn1rP4Zlm/DZmy4Gy8uJv9epAHmcg/0ddWss6y/RcyC164uNu9KQ7JXtg835sGK 8ybjw8la3rpikFJDwTcV0xvzE1yJGpC43KXipw7UTEV+bI/ZHWaADM9yUpYgBlXClFOx Wlfg== X-Gm-Message-State: AGRZ1gJPg2ZoqQmv2ck2qTnQk4VenGZMEc7SWuK4ky5giHenNjAtve3p Z3KG1tM7WtfAaAb+REoOiq7/CfVwrzvcogm/8Ro= X-Google-Smtp-Source: AJdET5fK5/hXnhtK1kk37hm04GiVsSI8EQhhTYspjvmtsLFwkUBnpiq4vkNWHfgvcZVmtVINJJGcoNNBxt4QlosWLas= X-Received: by 2002:a19:6a13:: with SMTP id u19mr2700626lfu.46.1541016157376; Wed, 31 Oct 2018 13:02:37 -0700 (PDT) MIME-Version: 1.0 References: <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net> In-Reply-To: From: Andrew Vylegzhanin Date: Wed, 31 Oct 2018 23:02:25 +0300 Message-ID: Subject: Re: NFS + Infiniband problem To: rmacklem@uoguelph.ca Cc: "Rodney W. Grimes" , freebsd-fs@freebsd.org, freebsd-infiniband@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Oct 2018 20:02:39 -0000 =D1=81=D1=80, 31 =D0=BE=D0=BA=D1=82. 2018 =D0=B3. =D0=B2 5:53, Rick Macklem= : > > > > >net.inet.tcp.cc.algorithm=3Dhtcp > Well, I'm not familiar with the current TCP stack (and, as noted before, I know > nothing about InfiniBand). All I can suggest is testing with the default congestion > control algorithm. (I always test with the default, which appears to be newreno.) > NFS traffic looks very different than a typical use of TCP. Lots of small TCP > segments in both directions interspersed with some larger ones (the write > requests or read replies). With this TCP settings same server serve NFS requests via 40G Ethernet on multiply clients with speed via 1G Eth ~ 105/110 MB/sec write/read. Of course I'll try to change congestion algorithm, but I don't think that will help. Also need to test setup with infniband set from connected mode to datagram mode. -- Andrew > > rick From owner-freebsd-infiniband@freebsd.org Thu Nov 1 00:27:19 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3636910D373A; Thu, 1 Nov 2018 00:27:19 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-eopbgr660062.outbound.protection.outlook.com [40.107.66.62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "GlobalSign Organization Validation CA - SHA256 - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C1641735BC; Thu, 1 Nov 2018 00:27:18 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM (52.132.50.155) by YTOPR0101MB1546.CANPRD01.PROD.OUTLOOK.COM (52.132.49.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1273.26; Thu, 1 Nov 2018 00:27:17 +0000 Received: from YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::9c71:6eb6:1bff:727b]) by YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM ([fe80::9c71:6eb6:1bff:727b%3]) with mapi id 15.20.1294.021; Thu, 1 Nov 2018 00:27:17 +0000 From: Rick Macklem To: Andrew Vylegzhanin CC: "Rodney W. Grimes" , "freebsd-fs@freebsd.org" , "freebsd-infiniband@freebsd.org" Subject: Re: NFS + Infiniband problem Thread-Topic: NFS + Infiniband problem Thread-Index: AQHUbzV7uxRrUqx4EESM4oETGDUinKU2U4sAgAACL+GAAOXvgIABagv/gAElKYCAAEaZ4w== Date: Thu, 1 Nov 2018 00:27:16 +0000 Message-ID: References: <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net> , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=rmacklem@uoguelph.ca; x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; YTOPR0101MB1546; 6:pegFwrocxH1W7ICIxfAFgOryQEvmi8NKbCWsQYdXUI+zKbmcULytGT25PLf7T4Ne2tdfzCHW5PTxkdReHDZGQQ6I0pPXiXO27vjvYKpSfZHZOvU1pT+i4UW0EpdGZhkr1bDkNqfidBzecZfP1KCLB4nvm63wOniKKXmza5IaCoVbup2gA78XiGBqXb/3e8w0Lyp3mQ8T9WMXYIMJX8fI4con4mtsuVGWxrcHphpeUk/1ONIssCw03vdg7NrUHDkcPCBzFmIss6PdBmEPsMOH5blt5wTr8+wyabeXRr9mV4aGPnE83w/QVZuyC63JfBywObmpkRnOfmge0gRRHtnLDVzL7i30KujV7RIQnJj3HRLbTOIhKdfhFkfCzP1PlTxwC4u5A8cXlo+lbI4qwI4/LAN4SNvTWyQxc8TMoxi7TBXsJh7agNR7mE0tsgkloXP8B0iKkoMMNvVfWt5QplCd/Q==; 5:6kzEtACuDzgk+evXMg0j0QwPiDYKONy3JCx6vA7W3Kl1+GOF3ukIN/YOxUpgHnUj1qRkCLLH2DhuU423QIjSHZzEsFfWwUiAkBJS3peVBTGopUeLMfdSocHBHxMHTvM8sDeN1t/99tSVuo+2exrVe6sY7/hcpKRZXcPJYQ2klyw=; 7:x5XABEz59c51KCXN/3J+F8PGaZtDkueGF6l+VotXWM0zP8Q5x5hyLcvr3MmxmBF85lWIeaPaL1g0KS13ZOC38VhkgTLR6BBNO6Ds/9qFb7efv91jffOdnhzRqV+1sCfIwRkGSq9yAt258wUTREwuug== x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 863b1fdc-90eb-446b-38fe-08d63f90c740 x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(2017052603328)(7153060)(7193020); SRVR:YTOPR0101MB1546; x-ms-traffictypediagnostic: YTOPR0101MB1546: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(158342451672863); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040522)(2401047)(8121501046)(5005006)(3231382)(944501410)(52105095)(93006095)(93001095)(3002001)(10201501046)(148016)(149066)(150057)(6041310)(20161123560045)(20161123564045)(20161123558120)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123562045)(201708071742011)(7699051)(76991095); SRVR:YTOPR0101MB1546; BCL:0; PCL:0; RULEID:; SRVR:YTOPR0101MB1546; x-forefront-prvs: 0843C17679 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(136003)(346002)(396003)(39860400002)(376002)(366004)(189003)(199004)(71190400001)(6436002)(105586002)(33656002)(6506007)(5660300001)(76176011)(106356001)(74316002)(99286004)(68736007)(478600001)(6916009)(7696005)(2906002)(1411001)(71200400001)(305945005)(97736004)(229853002)(476003)(446003)(46003)(93886005)(6246003)(4326008)(8676002)(54906003)(81166006)(11346002)(14454004)(81156014)(39060400002)(9686003)(25786009)(102836004)(5250100002)(53936002)(256004)(55016002)(2900100001)(8936002)(786003)(316002)(86362001)(186003)(486006)(74482002); DIR:OUT; SFP:1101; SCL:1; SRVR:YTOPR0101MB1546; H:YTOPR0101MB1162.CANPRD01.PROD.OUTLOOK.COM; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: uoguelph.ca does not designate permitted sender hosts) x-microsoft-antispam-message-info: QJNnb3pcZUvn+43cV1LCDqwHT4QZWQ5o9abUCvhEosfA8SbhxWqgfY9yRJAh5u7L21c7eGC+ALBI4QMU5GAVPFyi71yjCI9kCMLIkzgsFsEpG2np0NJRcM1jHNygIExtJgBC4MCdfHH0VOhLFLfrY9mG/Hv89a8i4wA4i4BZsglwmKEP4qKx+jG2MUBN2oieClG6kdCYFGVbr5jtnyTQ7S6Z10mk9OqKARI0Zg+2VKWYsnKG0ycdoCY8pJc+HcQqppaUvZxK0WtzglMWmtWbgmBoLG58AYNkjoq3CmLHmMGmel2jVttbjXENSr4IvQvqjFGobfVPfXytRS/CBN3AfrGd+9wc1Lv2Nz4o2EiifaA= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-Network-Message-Id: 863b1fdc-90eb-446b-38fe-08d63f90c740 X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Nov 2018 00:27:16.9281 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-Transport-CrossTenantHeadersStamped: YTOPR0101MB1546 X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Nov 2018 00:27:19 -0000 Andrew Vylegzhanin wrote: [stuff snipped] >With this TCP settings same server serve NFS requests via 40G Ethernet on = multiply >clients with speed via 1G Eth ~ 105/110 MB/sec write/read. >Of course I'll try to change congestion algorithm, but I don't think that = will help. Yes, I doubt changing the congestion algorithm will make much difference. >Also need to test setup with infniband set from connected mode to datagram= mode. Are you using IPv6 by any chance? Why I ask is that there was a problem with IPv6 fragmentation re-assembly. = If InfiniBand is using a larger MTU than the ethernet, the switch would probab= ly fragment the IP datagram. If any fragment is lost (or fragmentation re-asse= mbly is broken), it isn't going to work well. "netstat -s | fgrep "fragments dropped after" will show you the count of fragments dropped after timeout. If this value i= s increasing, then fragmentation re-assembly is an issue. (Check on the recei= ving end. The server for writes.) You might want to post on freebsd-net@, since someone there might know more about InfiniBand (and someone over there will definitely know about the IPv= 6 fragmentation problem. rick= From owner-freebsd-infiniband@freebsd.org Thu Nov 1 01:46:42 2018 Return-Path: Delivered-To: freebsd-infiniband@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6489910DC6D5; Thu, 1 Nov 2018 01:46:42 +0000 (UTC) (envelope-from avv314@gmail.com) Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CCDFD7723A; Thu, 1 Nov 2018 01:46:41 +0000 (UTC) (envelope-from avv314@gmail.com) Received: by mail-lf1-x12b.google.com with SMTP id n3-v6so13115951lfe.7; Wed, 31 Oct 2018 18:46:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8Jhwbm7kdrIBbWgP9UbP7r5BTcHsIlc7DP09ghEcGZA=; b=WFB6mqm1o1ujrgswAIfS4S0qnA8pZP3fCVXBN0KTvI2ZssoOAPuIkD/oBwpy6XZEMY kRCI+rSHANx2jTFnDWkNmqjcvTnmLr8zZ3ujCD72h4/yoktsXQgl+Wdxb1DytaS3nlFN aNxINAeTNQperi5JK2RvUrQMq+CRshyMeCH4h1eGWuPqi2lM/V4pGtQsPUAtBfWQ/Aye +8YKFoZxeqmKB914aci7NTrmI8ndjeSOvvG1UHypTqp2g+Ftqm3yjlsXDrfPXI5BUbWI POz3gpGkPH6tqvDFgw5rRJXODAZXnc/9bXZG5MdLyuxu6/K1tZH9uozP4fmV26gbX3hG GPAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8Jhwbm7kdrIBbWgP9UbP7r5BTcHsIlc7DP09ghEcGZA=; b=ph1dCUjUzTyKvxBgsxjy5qwLFbtzDM6v+avoK4cyQ7julBbSFUuv+7hlkex7wL8Lov jT0TSl+iW0jhFI7v+XlwA8TAy9qigtxQ1V7veybVJfdiakX276hYSncn7RbtbKxae+V9 eHOvkAtpuUE4spKqeIbatqhmWM/5wyNNz4aDUUIkr4BIbWNRcdm9+DxWmo3jt9sEA0fS QiqS8oMnIza9LHgvMTX6zSYZJl3ZL+c4z1bdv03qYtJ1zc7eLoYevJqU0gJ796qSIb+s 6pwBjrTv5ojJg7dnmpSWS9bOP/8vIHtQwDucvlwYBS7KYg6hEiwVDS8/YD1AnyAj+qbb 5kIA== X-Gm-Message-State: AGRZ1gK03gnPzI9/FBQsfIemxhFhJg8t5tb5AmCO05jXHT5iui0qMc2X CoqWrTy2Nh8RFzgQFxN/1D3T7qMzBga6GKvD26M= X-Google-Smtp-Source: AJdET5cqwOqSaY9mGe2n8T+BGKqrJX69m24iYN6LYk3eK1XvOSXkzerQmTfvERCIp3Da0uy4an5Xd+6HubtnUdMEs0o= X-Received: by 2002:a19:a84e:: with SMTP id r75mr3063086lfe.18.1541036800224; Wed, 31 Oct 2018 18:46:40 -0700 (PDT) MIME-Version: 1.0 References: <201810291506.w9TF6YAP057202@pdx.rh.CN85.dnsmgr.net> In-Reply-To: From: Andrew Vylegzhanin Date: Thu, 1 Nov 2018 04:46:28 +0300 Message-ID: Subject: Re: NFS + Infiniband problem To: rmacklem@uoguelph.ca Cc: "Rodney W. Grimes" , freebsd-fs@freebsd.org, freebsd-infiniband@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-infiniband@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Infiniband on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Nov 2018 01:46:42 -0000 =D1=87=D1=82, 1 =D0=BD=D0=BE=D1=8F=D0=B1. 2018 =D0=B3. =D0=B2 3:27, Rick Ma= cklem : > >Also need to test setup with infniband set from connected mode to datagram mode. > Are you using IPv6 by any chance? No, I don't. > Why I ask is that there was a problem with IPv6 fragmentation re-assembly. If > InfiniBand is using a larger MTU than the ethernet, the switch would probably > fragment the IP datagram. If any fragment is lost (or fragmentation re-assembly > is broken), it isn't going to work well. > "netstat -s | fgrep "fragments dropped after" [root@nas1 ~]# netstat -s | fgrep "fragments dropped after" 0 fragments dropped after timeout 0 fragments dropped after timeout > will show you the count of fragments dropped after timeout. If this value is > increasing, then fragmentation re-assembly is an issue. (Check on the receiving > end. The server for writes.) > > You might want to post on freebsd-net@, since someone there might know more > about InfiniBand (and someone over there will definitely know about the IPv6 > fragmentation problem. I'm cc'ed to freebsd-infiniband@ also, but silence. > > rick -- Andrew