Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Dec 2020 00:24:14 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        J David <j.david.lists@gmail.com>, Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: Major issues with nfsv4
Message-ID:  <YQXPR0101MB096834DB16439960ADAFCE29DDC70@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com>
References:  <CABXB=RRB2nUk0pPDisBQPdicUA3ooHpg8QvBwjG_nFU4cHvCYw@mail.gmail.com> <YQXPR0101MB096849ADF24051F7479E565CDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSyN%2Bo2yXcpmYw8sCSUUDhN-w28Vu9v_cCWa-2=pLZmHg@mail.gmail.com> <YQXPR0101MB09680D155B6D685442B5E25EDDCA0@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <CABXB=RSSE=yOwgOXsnbEYPqiWk5K5NfzLY=D%2BN9mXdVn%2B--qLQ@mail.gmail.com> <YQXPR0101MB0968B17010B3B36C8C41FDE1DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9Q9GAhNHbXGbKy7@kib.kiev.ua> <YQXPR0101MB0968C7629D57CA21319E50C2DDC90@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM> <X9UDArKjUqJVS035@kib.kiev.ua>, <CABXB=RRNnW9nNhFCJS1evNUTEX9LNnzyf2gOmZHHGkzAoQxbPw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--_002_YQXPR0101MB096834DB16439960ADAFCE29DDC70YQXPR0101MB0968_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

J David wrote:=0A=
[stuff snipped]=0A=
>The most common trace (91 times, or over 1/4 of all observed) is:=0A=
>=0A=
>__mtx_lock_sleep+0xf8 nfscl_nodeleg+0x207 nfs_lookup+0x314=0A=
>VOP_LOOKUP_APV+0x75 null_lookup+0x98 VOP_LOOKUP_APV+0x75 >lookup+0x451=0A=
>namei+0x414 kern_statat+0x72 freebsd11_stat+0x30 amd64_syscall+0x387=0A=
>fast_syscall_common+0xf8=0A=
>=0A=
>This trace appears roughly uniformly (28/38/34) in 3 of the 4 processes.=
=0A=
nfscl_nodeleg() plus two others acquire the mutex to check for a delegation=
=0A=
for the file. Servers should only issue delegations when the callback=0A=
path is working and some never issue delegations.=0A=
=0A=
So, if you are not running the nfscbd(8) daemon (there is no need=0A=
unless you are using delegations or pNFS) a trivial check for the=0A=
callback path being enabled should determine that no delegation=0A=
exists if it is not enabled. This avoids acquiring the mutex for this=0A=
common case.=0A=
--> You have no "Delegs" according to your "nfsstat -c -E".=0A=
=0A=
So, if you set up a test system with the attached patch applied to the=0A=
kernel and you do not run nfscbd(8) (ie no nfscbd_enable=3D"YES" in=0A=
your /etc/rc.conf) this patch might help.=0A=
--> It avoids acquiring the mutex for lookups and getattrs (stat calls).=0A=
=0A=
The only reason I haven't done this before is a concern w.r.t. a broken=0A=
NFSv4 server that erroneously issues delegations when the callback=0A=
path is not working. (I don't know of such a server and, since you do not=
=0A=
have delegations, it should not be an issue for your case.)=0A=
=0A=
Might be enough to alleviate your problem.=0A=
=0A=
I will work on changing the single open linked list to a hash table of=0A=
linked lists, but that is probably weeks away from a test patch.=0A=
=0A=
Let me know if you can try the patch and if it helps, rick=0A=
=0A=
The full set of traces (sorted by uniq -c) is here:=0A=
=0A=
https://pastebin.com/HUqkeMri=0A=
=0A=
(This message is already long enough!)=0A=
=0A=
The characters stripped off the front of each line are consistently:=0A=
=0A=
(pid) (uid) python3 -=0A=
=0A=
It happens these were all Python-based jobs.  Python seems predisposed=0A=
to trigger this, but non-Python jobs trigger it as well.  Heavy use of=0A=
stat() does seem to be a common element regardless of job type.=0A=
=0A=
Here's the output of "nfsstat -E -c" 60 seconds after running it with -z:=
=0A=
=0A=
Client Info:=0A=
RPC Counts:=0A=
      Getattr      Setattr       Lookup     Readlink         Read        Wr=
ite=0A=
         1667          111         6376            0           42          =
153=0A=
       Create       Remove       Rename         Link      Symlink        Mk=
dir=0A=
          111            7           14            0            0          =
  0=0A=
        Rmdir      Readdir     RdirPlus       Access        Mknod       Fss=
tat=0A=
            0            0            0         2620            0          =
160=0A=
       FSinfo     pathConf       Commit      SetClId    SetClIdCf         L=
ock=0A=
            0            0          113            0            0          =
 48=0A=
        LockT        LockU         Open      OpenCfr=0A=
            0           48          320            0=0A=
   OpenDownGr        Close=0A=
            0          402=0A=
    RelLckOwn  FreeStateID    PutRootFH     DelegRet       GetAcl       Set=
Acl=0A=
            0            3            0            0            0          =
  0=0A=
   ExchangeId   CreateSess  DestroySess  DestroyClId    LayoutGet   GetDevI=
nfo=0A=
            0            0            0            0            0          =
  0=0A=
 LayoutCommit LayoutReturn ReclaimCompl    ReadDataS   WriteDataS  CommitDa=
taS=0A=
            0            0            0            0            0          =
  0=0A=
   OpenLayout CreateLayout=0A=
            0            0=0A=
    OpenOwner        Opens    LockOwner        Locks       Delegs     Local=
Own=0A=
        21175       130439           30            6            0          =
  0=0A=
    LocalOpen    LocalLown    LocalLock=0A=
            0            0            0=0A=
Rpc Info:=0A=
     TimedOut      Invalid    X Replies      Retries     Requests=0A=
            0            0            0            0        12247=0A=
Cache Info:=0A=
    Attr Hits  Attr Misses    Lkup Hits  Lkup Misses=0A=
      1110054          858      1002829         6361=0A=
    BioR Hits  BioR Misses    BioW Hits  BioW Misses=0A=
         2000           54          292          153=0A=
   BioRL Hits BioRL Misses    BioD Hits  BioD Misses=0A=
         6911            0          208            0=0A=
    DirE Hits  DirE Misses=0A=
          104            0=0A=
=0A=
This does reflect the whole machine, not just those four processes.=0A=
=0A=
Finally, when I attempted to kill those four processes with ktrace=0A=
running on all of them, the system panicked:=0A=
=0A=
Fatal trap 12: page fault while in kernel mode=0A=
cpuid =3D 2; apic id =3D 02=0A=
fault virtual address =3D 0x10=0A=
fault code =3D supervisor read data, page not present=0A=
instruction pointer =3D 0x20:0xffffffff8272d28f=0A=
stack pointer         =3D 0x28:0xfffffe008a5f24c0=0A=
frame pointer         =3D 0x28:0xfffffe008a5f25f0=0A=
code segment =3D base 0x0, limit 0xfffff, type 0x1b=0A=
=3D DPL 0, pres 1, long 1, def32 0, gran 1=0A=
processor eflags =3D interrupt enabled, resume, IOPL =3D 0=0A=
current process =3D 5627 (node)=0A=
trap number =3D 12=0A=
panic: page fault=0A=
cpuid =3D 2=0A=
time =3D 1607845622=0A=
KDB: stack backtrace:=0A=
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008a5f2=
180=0A=
vpanic() at vpanic+0x17b/frame 0xfffffe008a5f21d0=0A=
panic() at panic+0x43/frame 0xfffffe008a5f2230=0A=
trap_fatal() at trap_fatal+0x391/frame 0xfffffe008a5f2290=0A=
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008a5f22e0=0A=
trap() at trap+0x286/frame 0xfffffe008a5f23f0=0A=
calltrap() at calltrap+0x8/frame 0xfffffe008a5f23f0=0A=
--- trap 0xc, rip =3D 0xffffffff8272d28f, rsp =3D 0xfffffe008a5f24c0, rbp=
=0A=
=3D 0xfffffe008a5f25f0 ---=0A=
null_bypass() at null_bypass+0xaf/frame 0xfffffe008a5f25f0=0A=
VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x80/frame 0xfffffe008a5f2620=0A=
closef() at closef+0x8f/frame 0xfffffe008a5f26b0=0A=
fdescfree_fds() at fdescfree_fds+0x3c/frame 0xfffffe008a5f2700=0A=
fdescfree() at fdescfree+0x466/frame 0xfffffe008a5f27c0=0A=
exit1() at exit1+0x488/frame 0xfffffe008a5f2820=0A=
sigexit() at sigexit+0x159/frame 0xfffffe008a5f2b00=0A=
postsig() at postsig+0x2fd/frame 0xfffffe008a5f2bc0=0A=
ast() at ast+0x317/frame 0xfffffe008a5f2bf0=0A=
doreti_ast() at doreti_ast+0x1f/frame 0x7fffffff67a0=0A=
=0A=
The fsck ate the ktrace.out file on reboot.  This system does not have=0A=
kernel debug symbols.  If there's a way to try to figure out what=0A=
null_bypass+0xaf corresponds to using the machine that built this=0A=
kernel, which has the kernel, objects, debug symbols, and source,=0A=
please let me know.  It's built from releng/12.2 r368515.=0A=
=0A=
Thanks!=0A=

--_002_YQXPR0101MB096834DB16439960ADAFCE29DDC70YQXPR0101MB0968_
Content-Type: application/octet-stream; name="clstate.patch"
Content-Description: clstate.patch
Content-Disposition: attachment; filename="clstate.patch"; size=912;
	creation-date="Mon, 14 Dec 2020 00:24:08 GMT";
	modification-date="Mon, 14 Dec 2020 00:24:08 GMT"
Content-Transfer-Encoding: base64

LS0tIHN5cy9mcy9uZnNjbGllbnQvbmZzX2Nsc3RhdGUuYy5zYXYJMjAyMC0xMi0xMyAwNjoxNzoz
MC42NDI3NTIwMDAgLTA4MDAKKysrIHN5cy9mcy9uZnNjbGllbnQvbmZzX2Nsc3RhdGUuYwkyMDIw
LTEyLTEzIDA2OjM4OjEwLjgxOTkwOTAwMCAtMDgwMApAQCAtNDM3NCw3ICs0Mzc0LDcgQEAgbmZz
Y2xfbm9kZWxlZyh2bm9kZV90IHZwLCBpbnQgd3JpdGVkZWxlZykKIAogCW5wID0gVlRPTkZTKHZw
KTsKIAlubXAgPSBWRlNUT05GUyh2cC0+dl9tb3VudCk7Ci0JaWYgKCFORlNIQVNORlNWNChubXAp
KQorCWlmICghTkZTSEFTTkZTVjQobm1wKSB8fCBuZnNjbF9lbmFibGVjYWxsYiA9PSAwKQogCQly
ZXR1cm4gKDEpOwogCU5GU0xPQ0tDTFNUQVRFKCk7CiAJY2xwID0gbmZzY2xfZmluZGNsKG5tcCk7
CkBAIC00NzMwLDcgKzQ3MzAsNyBAQCBuZnNjbF9kZWxlZ21vZHRpbWUodm5vZGVfdCB2cCkKIAlz
dHJ1Y3QgbmZzbW91bnQgKm5tcDsKIAogCW5tcCA9IFZGU1RPTkZTKHZwLT52X21vdW50KTsKLQlp
ZiAoIU5GU0hBU05GU1Y0KG5tcCkpCisJaWYgKCFORlNIQVNORlNWNChubXApIHx8IG5mc2NsX2Vu
YWJsZWNhbGxiID09IDApCiAJCXJldHVybjsKIAlORlNMT0NLQ0xTVEFURSgpOwogCWNscCA9IG5m
c2NsX2ZpbmRjbChubXApOwpAQCAtNDc1OSw3ICs0NzU5LDcgQEAgbmZzY2xfZGVsZWdnZXRtb2R0
aW1lKHZub2RlX3QgdnAsIHN0cnVjdCB0aW1lc3BlYyAqbXRpCiAJc3RydWN0IG5mc21vdW50ICpu
bXA7CiAKIAlubXAgPSBWRlNUT05GUyh2cC0+dl9tb3VudCk7Ci0JaWYgKCFORlNIQVNORlNWNChu
bXApKQorCWlmICghTkZTSEFTTkZTVjQobm1wKSB8fCBuZnNjbF9lbmFibGVjYWxsYiA9PSAwKQog
CQlyZXR1cm47CiAJTkZTTE9DS0NMU1RBVEUoKTsKIAljbHAgPSBuZnNjbF9maW5kY2wobm1wKTsK

--_002_YQXPR0101MB096834DB16439960ADAFCE29DDC70YQXPR0101MB0968_--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YQXPR0101MB096834DB16439960ADAFCE29DDC70>