From nobody Tue Jul 6 13:48:21 2021 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id E21868D7D2D for ; Tue, 6 Jul 2021 13:48:35 +0000 (UTC) (envelope-from kungfujesus06@gmail.com) Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GK3mf74cjz4nS5 for ; Tue, 6 Jul 2021 13:48:34 +0000 (UTC) (envelope-from kungfujesus06@gmail.com) Received: by mail-lf1-x130.google.com with SMTP id c28so14992546lfp.11 for ; Tue, 06 Jul 2021 06:48:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=+VP/X9O1YhLT4BHSYKQtmSlHP2F5/p5kPhsE+qhjUik=; b=QSMvhF/bZsqDbJBFKjtreLEpYp0EwIqKjPsyGOhe7nSjZ8MAJsXFTAD6oGcB2WIay+ UrDxEvic+oer+Az3lTyb98dug3htRQxcYXnmxjviSbVY7TxSsYDq8w/aolFLX4+EsQc/ 2cFz0wYeqxwxcQKId8NHGL8cZKh7cGuxPCRA+4sZJfPuvhZISn2Ru4gIIKgr/FP3kyQC 3LLENpd/L+NcNM0VU5xPl3A8aT8j8JBiQdwpUqWs2XnOlHUCCCM0JOt1KSI2aGWc31mN TlqlCGGK4I/GX12cNvbyVkgyHYihmKg3YPZYiV+4btLmEYN5bRFW+j8z4Ny+FlYqXYcL /SoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=+VP/X9O1YhLT4BHSYKQtmSlHP2F5/p5kPhsE+qhjUik=; b=EIkXCNzriXi2BW2WISzjFdgFkBFuovvKDqjb4w+28iysMGHQStEvWzOnKV7vRYQH2Z u2OMq2xF5zWv87+cNmfVXhewpbXy21So+9apV2ZEQTspfq6XImvs7rRnamUguG+1Wkp0 Sp5YQaocCR9cMxrVwmyYN5HfcE2QciTBEDTZUs9pxPEEe7RrogERNK7W4P6vPdFfoNEs OSoNlJS8f2WhQ/HJ64VYYqbiIW/gDOjCS/EE1Mw9739LZ3Z6y/6llp1Zi0EzAF7WSDqE vTmva6OYePfR7AZUgIYkg3+EwnG7PlTvoZ7mVn320yVxuonEthWmbxhUnzehSIXf0KCf 5Dkw== X-Gm-Message-State: AOAM531xHTl5i4Xg313WaqRSzSR8/JL2dYIPKglM2vioQIE6EiRBMzrY Oy+1tXZCKZDN7OON0g4ZI8L4wTaj0uQP8exuIrWoLwBJrog0Pwq4 X-Google-Smtp-Source: ABdhPJxNC24zmz/nGBH1ajB10Hf7zUcKx3XKep3hU3CvVWLUYZThiZtKrhlbTSawRvnX/ocHyOen5GMv28s1nczJp40= X-Received: by 2002:a19:6f49:: with SMTP id n9mr11966010lfk.459.1625579312645; Tue, 06 Jul 2021 06:48:32 -0700 (PDT) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 From: Adam Stylinski Date: Tue, 6 Jul 2021 09:48:21 -0400 Message-ID: Subject: Issues with NFS RPC To: freebsd-fs@freebsd.org Content-Type: multipart/alternative; boundary="0000000000007c078905c674adb6" X-Rspamd-Queue-Id: 4GK3mf74cjz4nS5 X-Spamd-Bar: + Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=QSMvhF/b; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of kungfujesus06@gmail.com designates 2a00:1450:4864:20::130 as permitted sender) smtp.mailfrom=kungfujesus06@gmail.com X-Spamd-Result: default: False [2.00 / 15.00]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_NONE(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a00:1450:4864:20::130:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(1.00)[1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; NEURAL_SPAM_MEDIUM(1.00)[1.000]; RCPT_COUNT_ONE(0.00)[1]; SPAMHAUS_ZRD(0.00)[2a00:1450:4864:20::130:from:127.0.2.255]; NEURAL_SPAM_LONG(1.00)[1.000]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::130:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-fs] X-ThisMailContainsUnwantedMimeParts: Y --0000000000007c078905c674adb6 Content-Type: text/plain; charset="UTF-8" Hello, So this may be something somewhat specific to my configuration, but it's starting to smell like a bug somewhere in NFS's RPC handling (either the Linux client or the FreeBSD rpcbind). I have two machines, connected via a 40gbps direct attached link, with static IPs. They are leveraging jumbo frames (9000 byte MTU). The storage is backed by a healthy zpool. I can reliably reproduce this issue, but it takes a long amount of time (it was 40GB worth of packet capture before I gave up and then the issue finally reappeared). It seems that after a long enough time frame over an NFSv3 export, virtualbox hangs my VM that has disks backed over that share. The rsize and wsize are 128k to match the maximum stripe size of the pool, and I'm just using plain old sec=sys, no kerberos involved. The error I get from rpcdebug on the Linux client looks as follows: https://pastebin.com/rCv2ZTri Error 110 I looked up is a generic timeout. During this time, when the server seems to be going deaf to these xids, I can ping the server over the interface the connection is over. Traffic flows fine, the NICs are basically unutilized. There are no visible errors on any of the interfaces. The NICs are ConnectX-3's, running in en mode (ethernet). I tried switching to NFSv4, and eventually had the same problem, but with the added bonus that it never seems to successfully retransmit and hangs in perpetuity (NFSv3 eventually recovers, after the likely 600 second timeout). These seem to be fairly reliable NICs, and I don't see anything on the server or client to indicate that it's a network hardware issue. Is there anything I can do to diagnose this on the FreeBSD server end? It seems that the Linux kernel's rpcdebug facilities seem to mostly just give a bunch of noise. I did manage to run wireshark on the client during this stall period, and I had noticed some TCP packets that were classified as duplicate ACKs when the NFS traffic finally turned over again. --0000000000007c078905c674adb6--