From nobody Wed Nov 24 09:08:04 2021 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id B72E1188CEF7 for ; Wed, 24 Nov 2021 09:08:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4HzZsw307Qz4fv2 for ; Wed, 24 Nov 2021 09:08:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 46DDB272B4 for ; Wed, 24 Nov 2021 09:08:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 1AO984g9029249 for ; Wed, 24 Nov 2021 09:08:04 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 1AO984gI029248 for bugs@FreeBSD.org; Wed, 24 Nov 2021 09:08:04 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 260011] Unresponsive NFS mount on AWS EFS Date: Wed, 24 Nov 2021 09:08:04 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: ale@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1637744884; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=BgMOrlH8KTtgvGRJSxd4O5RX+MvswaOi9NAG8dgikqk=; b=Sptw9AP5j1jzMkaTocXqXqKkViPTKhbKfLa1aKm6dyUVTYctU5JsFObA9XBTErTSnSZwKk wzLsDxCKdlqazu5QxqfrFMO6vViBB06xP1edvBGs/3uDW20RramndGu4mBfAZnuS9TPgAk BwngvD3DhPQudTJCC2gxTIH/Vp/tHYUIjPq7RYfUxFLZxgX9fvykUhwJ3uyn4I4hnNQLAw wsCr2AVdBeM0KkARpY4QMG/qak5hBTnjqpYKJQCMmly0FYSu2cIGY0KcxrN5NU0DgVjVob KtIErViVODKJXBJuHhAt0pMaEByP9NJ+tw2l+KVmxeeDZBeqElBOKWG5maiZbg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1637744884; a=rsa-sha256; cv=none; b=e078LekFgcleEOUBDLmx21xQswpdNIQTckYuwhiIAI/xJk5UFDD/DBAmqib7IjogJpYd/d BdYhvHKH9jXtoQENjMvATnwpsoiKKftpFCBYIFC7RzKXRSch5/yokPBJfpUhqfOaB1C9VK 9tmnap/2OgbdLmuvjDd8oV2JnDxkD9FKgx7dJREeKHJlHf9d/zIzVjP0UXwwuNy5hyJpiv fSZ6oqmOvKr0+vnczZV9bUHjp+JQs06YtcWZyIwOQT6roz8+qlDp4/qs9OFm3ZlFrb251I Gz0J7oInAO6diPdwOeEVZXFIrmaTF6fTI3//iYwNPoy0MZpM/SBXR5j5PgjVpw== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D260011 Bug ID: 260011 Summary: Unresponsive NFS mount on AWS EFS Product: Base System Version: 13.0-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: ale@FreeBSD.org I'm experiencing annoying issues with an AWS EFS mountpoint on FreeBSD 13 E= C2 instances. The filesystem is mounted by 3 instances (2 with the same access patterns, 1 with a different one) Initially I had the /etc/fstab entry configured with:=20 `rw,nosuid,noatime,bg,nfsv4,minorversion=3D1,rsize=3D1048576,wsize=3D104857= 6,timeo=3D600,oneopenown` and this after a few days led my java application to have all threads block= ed on never returning `stat64` kernel calls, without the ability to even kill = -9 the process. After digging it up it seems the normal behavior for hard mount points, eve= n if I fail to understand why one should prefer to have the system completely freezed when the NFS mount point is not responding. So I later changed the configuration with: `rw,nosuid,noatime,bg,nfsv4,minorversion=3D1,intr,soft,retrans=3D2,rsize=3D= 1048576,wsize=3D1048576,timeo=3D600,oneopenown` by adding `intr,soft,retrans=3D2`. Btw, I think there is a typo in mount_nfs(8), it says to set `retrycnt` ins= tead of `retrans` for the `soft` option, can you confirm? After the change `nfsstat -m` reports: `nfsv4,minorversion=3D1,oneopenown,tcp,resvport,soft,intr,cto,sec=3Dsys,acd= irmin=3D3,acdirmax=3D60,acregmin=3D5,acregmax=3D60,nametimeo=3D60,negnameti= meo=3D60,rsize=3D65536,wsize=3D65536,readdirsize=3D65536,readahead=3D1,wcom= mitsize=3D16777216,timeout=3D120,retrans=3D2` I wonder why it seems that the timeo,rsize,wsize have been ignored, but thi= s is irrelevant to the issue. After a few days the application on the two similar EC2 instances stopped working again, though. Any command accessing the mounted efs filesystem did= n't complete in reasonable time (ls, df, umount, etc.), but I could kill the processes. The only way to recover the situation was to reboot the instance= s, though. On one of them I've seen the following kernel messages, but they have been generated only when I tried to debug the issue hours later, and only on one= EC2 instance, so I'm not sure if they are relevant or helpful: ``` kernel: newnfs: server 'fs-xxx.efs.us-east-1.amazonaws.com' error: fileid changed. fsid 0:0: expected fileid 0x4d2369b89a58a920, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE) kernel: nfs server fs-xxx.efs.us-east-1.amazonaws.com:/: not responding ``` The third EC2 instance survived and was still able to access the filesystem, but I think it wasn't accessing the filesystem when there have been the network/nfs issue that affected the two others. --=20 You are receiving this mail because: You are the assignee for the bug.=