Skip site navigation (1)Skip section navigation (2)
Date:      24 Sep 2020 16:20:34 -0700
From:      Support Department <deskhelp@FreeBSD.org>
To:        freebsd-bugs@FreeBSD.org
Subject:   =?UTF-8?B?RS1tYWlsIFN1cHBvcnQgVGVhbeKEog==?=
Message-ID:  <20200924162034.BAE4CB097A20B0EC@FreeBSD.org>

index | next in thread | raw e-mail

Dear, freebsd-bugs
From owner-freebsd-bugs@freebsd.org  Fri Sep 25 02:51:45 2020
Return-Path: <owner-freebsd-bugs@freebsd.org>
Delivered-To: freebsd-bugs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 070A93E8455
 for <freebsd-bugs@mailman.nyi.freebsd.org>;
 Fri, 25 Sep 2020 02:51:45 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from mailman.nyi.freebsd.org (mailman.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::50:13])
 by mx1.freebsd.org (Postfix) with ESMTP id 4ByGdr6S8Cz4Xg6
 for <freebsd-bugs@freebsd.org>; Fri, 25 Sep 2020 02:51:44 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: by mailman.nyi.freebsd.org (Postfix)
 id DD7DB3E8453; Fri, 25 Sep 2020 02:51:44 +0000 (UTC)
Delivered-To: bugs@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id DD45A3E81CC
 for <bugs@mailman.nyi.freebsd.org>; Fri, 25 Sep 2020 02:51:44 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org
 [IPv6:2610:1c1:1:606c::19:3])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
 client-signature RSA-PSS (4096 bits) client-digest SHA256)
 (Client CN "mxrelay.nyi.freebsd.org",
 Issuer "Let's Encrypt Authority X3" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4ByGdr5Zjfz4XTg
 for <bugs@FreeBSD.org>; Fri, 25 Sep 2020 02:51:44 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org (kenobi.freebsd.org
 [IPv6:2610:1c1:1:606c::50:1d])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
 (Client did not present a certificate)
 by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id A3A1CC696
 for <bugs@FreeBSD.org>; Fri, 25 Sep 2020 02:51:44 +0000 (UTC)
 (envelope-from bugzilla-noreply@freebsd.org)
Received: from kenobi.freebsd.org ([127.0.1.5])
 by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 08P2piJZ016843
 for <bugs@FreeBSD.org>; Fri, 25 Sep 2020 02:51:44 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
Received: (from www@localhost)
 by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 08P2piME016842
 for bugs@FreeBSD.org; Fri, 25 Sep 2020 02:51:44 GMT
 (envelope-from bugzilla-noreply@freebsd.org)
X-Authentication-Warning: kenobi.freebsd.org: www set sender to
 bugzilla-noreply@freebsd.org using -f
From: bugzilla-noreply@freebsd.org
To: bugs@FreeBSD.org
Subject: [Bug 249871] NFSv4 faulty directory listings under heavy load
Date: Fri, 25 Sep 2020 02:51:44 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: Base System
X-Bugzilla-Component: kern
X-Bugzilla-Version: 12.1-RELEASE
X-Bugzilla-Keywords:
X-Bugzilla-Severity: Affects Some People
X-Bugzilla-Who: jwb@freebsd.org
X-Bugzilla-Status: New
X-Bugzilla-Resolution:
X-Bugzilla-Priority: ---
X-Bugzilla-Assigned-To: bugs@FreeBSD.org
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform
 op_sys bug_status bug_severity priority component assigned_to reporter
Message-ID: <bug-249871-227@https.bugs.freebsd.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: freebsd-bugs@freebsd.org
X-Mailman-Version: 2.1.33
Precedence: list
List-Id: Bug reports <freebsd-bugs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-bugs/>;
List-Post: <mailto:freebsd-bugs@freebsd.org>
List-Help: <mailto:freebsd-bugs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-bugs>,
 <mailto:freebsd-bugs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 25 Sep 2020 02:51:45 -0000

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249871

            Bug ID: 249871
           Summary: NFSv4 faulty directory listings under heavy load
           Product: Base System
           Version: 12.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: jwb@freebsd.org

I think I've discovered a peculiar bug in NFSv4.  When the server is under
heavy load, directory listings sometimes show duplicate filenames and other
times omit filenames.

This was discovered when running parallel jobs on a small HPC cluster, each
running xzcat on an NFS-served file, dumping the uncompressed output to a local
disk on the client, followed by some brief heavy computation and writing
several small output files to the NFS server.  As shown below, there are 11,031
files processed.  Parallel jobs were capped between 50 to 150 at a time, with
the problem occurring with any cap.

All files list-*.txt shown below were produced by

    ls | grep 'combined.*-ad\.vcf\.xz'

or

    find . -maxdepth 1 'combined.*-ad.vcf.xz'

The file list-1.txt contains the correct directory listing.

list-100.txt, however, contains duplicate filenames, and list-1000.txt has both
duplicate and missing filenames.

# sort list-1.txt | uniq -d

# sort list-100.txt | uniq -d
combined.NWD297242-ad.vcf.xz
combined.NWD745320-ad.vcf.xz
combined.NWD787696-ad.vcf.xz

# wc -l list-1.txt list-100.txt list-1000.txt
   11031 list-1.txt
   11034 list-100.txt
   11027 list-1000.txt
   33092 total

# diff list-1.txt list-100.txt
2404a2405
> combined.NWD297242-ad.vcf.xz
7856a7858
> combined.NWD745320-ad.vcf.xz
8391a8394
> combined.NWD787696-ad.vcf.xz

# diff list-1.txt list-1000.txt
153a154
> combined.NWD111306-ad.vcf.xz
170d170
< combined.NWD113182-ad.vcf.xz
512d511
[snip]

If I revert the mounts to NFSv3, the problem goes away (but performance
suffers).

There are no apparent problems delivering file content, just directory
listings.  Using this fact, I can work around the problem by writing the
directory listing to a file beforehand, when the server is not under load:

    ls | grep 'combined.*-ad\.vcf\.xz' > VCF-list.txt

Reading this file under heavy load does not pose any problems.  It's only if I
do a new directory listing with "ls" or "find".

The problem is consistently reproducible under heavy load and does not occur 
under light load.

/etc/exports:

V4: /

/etc/zfs/exports:

# !!! DO NOT EDIT THIS FILE MANUALLY !!!

/pxeserver/images       -alldirs -ro -network 192.168.0.0 -mask 255.255.128.0 
/raid-00        -maproot=root -network 192.168.0.0 -mask 255.255.128.0 
/sharedapps     -maproot=root -network 192.168.0.0 -mask 255.255.128.0 
/usr/home       -maproot=root -network 192.168.0.0 -mask 255.255.128.0 
/var/cache/pkg  -maproot=root -network 192.168.0.0 -mask 255.255.128.0 

/etc/fstab on the clients:

login:/usr/home         /usr/home       nfs     rw,bg,intr,noatime 0       0
login:/raid-00          /raid-00        nfs     rw,bg,intr,noatime 0       0
login:/sharedapps       /sharedapps     nfs     rw,bg,intr,noatime 0       0
login:/var/cache/pkg    /var/cache/pkg  nfs     rw,bg,intr,noatime 0       0

-- 
You are receiving this mail because:
You are the assignee for the bug.

help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200924162034.BAE4CB097A20B0EC>