From owner-freebsd-bugs@freebsd.org Thu Sep 24 23:40:50 2020 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id A9C1B3E4538 for ; Thu, 24 Sep 2020 23:40:50 +0000 (UTC) (envelope-from deskhelp@FreeBSD.org) Received: from slot0.unliusa.com (simon6.cheong.pserver.ru [185.118.164.230]) by mx1.freebsd.org (Postfix) with ESMTP id 4ByBPZ3YfTz4PNm for ; Thu, 24 Sep 2020 23:40:50 +0000 (UTC) (envelope-from deskhelp@FreeBSD.org) From: Support Department To: freebsd-bugs@FreeBSD.org Subject: =?UTF-8?B?RS1tYWlsIFN1cHBvcnQgVGVhbeKEog==?= Date: 24 Sep 2020 16:20:34 -0700 Message-ID: <20200924162034.BAE4CB097A20B0EC@FreeBSD.org> MIME-Version: 1.0 X-Rspamd-Queue-Id: 4ByBPZ3YfTz4PNm X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [0.00 / 15.00]; ASN(0.00)[asn:44493, ipnet:185.118.164.0/22, country:RU]; local_wl_from(0.00)[FreeBSD.org] Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.33 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Sep 2020 23:40:50 -0000 Dear, freebsd-bugs From owner-freebsd-bugs@freebsd.org Fri Sep 25 02:51:45 2020 Return-Path: Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 070A93E8455 for ; Fri, 25 Sep 2020 02:51:45 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org [IPv6:2610:1c1:1:606c::50:13]) by mx1.freebsd.org (Postfix) with ESMTP id 4ByGdr6S8Cz4Xg6 for ; Fri, 25 Sep 2020 02:51:44 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.nyi.freebsd.org (Postfix) id DD7DB3E8453; Fri, 25 Sep 2020 02:51:44 +0000 (UTC) Delivered-To: bugs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id DD45A3E81CC for ; Fri, 25 Sep 2020 02:51:44 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4ByGdr5Zjfz4XTg for ; Fri, 25 Sep 2020 02:51:44 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id A3A1CC696 for ; Fri, 25 Sep 2020 02:51:44 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 08P2piJZ016843 for ; Fri, 25 Sep 2020 02:51:44 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 08P2piME016842 for bugs@FreeBSD.org; Fri, 25 Sep 2020 02:51:44 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 249871] NFSv4 faulty directory listings under heavy load Date: Fri, 25 Sep 2020 02:51:44 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 12.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: jwb@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 25 Sep 2020 02:51:45 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D249871 Bug ID: 249871 Summary: NFSv4 faulty directory listings under heavy load Product: Base System Version: 12.1-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: jwb@freebsd.org I think I've discovered a peculiar bug in NFSv4. When the server is under heavy load, directory listings sometimes show duplicate filenames and other times omit filenames. This was discovered when running parallel jobs on a small HPC cluster, each running xzcat on an NFS-served file, dumping the uncompressed output to a l= ocal disk on the client, followed by some brief heavy computation and writing several small output files to the NFS server. As shown below, there are 11= ,031 files processed. Parallel jobs were capped between 50 to 150 at a time, wi= th the problem occurring with any cap. All files list-*.txt shown below were produced by ls | grep 'combined.*-ad\.vcf\.xz' or find . -maxdepth 1 'combined.*-ad.vcf.xz' The file list-1.txt contains the correct directory listing. list-100.txt, however, contains duplicate filenames, and list-1000.txt has = both duplicate and missing filenames. # sort list-1.txt | uniq -d # sort list-100.txt | uniq -d combined.NWD297242-ad.vcf.xz combined.NWD745320-ad.vcf.xz combined.NWD787696-ad.vcf.xz # wc -l list-1.txt list-100.txt list-1000.txt 11031 list-1.txt 11034 list-100.txt 11027 list-1000.txt 33092 total # diff list-1.txt list-100.txt 2404a2405 > combined.NWD297242-ad.vcf.xz 7856a7858 > combined.NWD745320-ad.vcf.xz 8391a8394 > combined.NWD787696-ad.vcf.xz # diff list-1.txt list-1000.txt 153a154 > combined.NWD111306-ad.vcf.xz 170d170 < combined.NWD113182-ad.vcf.xz 512d511 [snip] If I revert the mounts to NFSv3, the problem goes away (but performance suffers). There are no apparent problems delivering file content, just directory listings. Using this fact, I can work around the problem by writing the directory listing to a file beforehand, when the server is not under load: ls | grep 'combined.*-ad\.vcf\.xz' > VCF-list.txt Reading this file under heavy load does not pose any problems. It's only i= f I do a new directory listing with "ls" or "find". The problem is consistently reproducible under heavy load and does not occu= r=20 under light load. /etc/exports: V4: / /etc/zfs/exports: # !!! DO NOT EDIT THIS FILE MANUALLY !!! /pxeserver/images -alldirs -ro -network 192.168.0.0 -mask 255.255.128= .0=20 /raid-00 -maproot=3Droot -network 192.168.0.0 -mask 255.255.128.0=20 /sharedapps -maproot=3Droot -network 192.168.0.0 -mask 255.255.128.0=20 /usr/home -maproot=3Droot -network 192.168.0.0 -mask 255.255.128.0=20 /var/cache/pkg -maproot=3Droot -network 192.168.0.0 -mask 255.255.128.0=20 /etc/fstab on the clients: login:/usr/home /usr/home nfs rw,bg,intr,noatime 0 0 login:/raid-00 /raid-00 nfs rw,bg,intr,noatime 0 0 login:/sharedapps /sharedapps nfs rw,bg,intr,noatime 0 0 login:/var/cache/pkg /var/cache/pkg nfs rw,bg,intr,noatime 0 0 --=20 You are receiving this mail because: You are the assignee for the bug.=