Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 Oct 2021 01:28:37 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 259163] 12.2-p9 & -p10 NFSv3 server times out on CentOS 8 client after ~13-14 minute for file copying
Message-ID:  <bug-259163-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D259163

            Bug ID: 259163
           Summary: 12.2-p9 & -p10 NFSv3 server times out on CentOS 8
                    client after ~13-14 minute for file copying
           Product: Base System
           Version: 12.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: parv.0zero9+freebsd@gmail.com

Our ZFS-based storage system tracks FreeBSD 12.2 release. A directory from =
ZFS
RAID-Z2 dataset is exported, with use of NFS v3 server. (NFS v4 server is n=
ot
being used due to the need to implement idmap(d).)

FreeBSD server has this in /etc/exports ...

  /real/dir  -alldirs,-maproot=3Droot  -network IP-address/netmask


... CentOS client mounts the exported directory via autofs as ...

  SERVER-addr  -soft,retrans=3D1,retry=3D1,vers=3D3  SERVER-addr:/real/dir=
=20


After updating 12.2 to -p9 from -p8, which brought changes to libcasper, Ce=
ntOS
8 NFS v3 client started seeing NFS connection timing out after ~13-14 minut=
es
of a file copy job. Going to -p10 did not help the situation any.

Moving back to -p8 does not cause any time out issues on CentOS side.

Around time of time-out, tcpdump of NFS connection on both CentOS client &
FreeBSD server only showed that FreeBSD server was indeed slower to respond=
 in
the expected time.

Output on FreeBSD server due to "vfs.nfsd.debuglevel=3D2" MIB did not enlig=
hten
me;  produced a flood of seemingly irrelevant output so it's entirely possi=
ble
I might have missed something.

There had/have not been any messages on FreeBSD related to or around the ti=
me
of NFS connection time out.


When CentOS client experiences NFS server time out, it puts out "CPU stuck"
messages about multiple CPUs on the tty; dmesg gets ...

...
2021-08-23T21:20:08,041420 watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
[kworker/u24:3:856093]
2021-08-23T21:20:08,043420 watchdog: BUG: soft lockup - CPU#1 stuck for 23s!
[kworker/u24:7:856767]
2021-08-23T21:20:08,044420 watchdog: BUG: soft lockup - CPU#2 stuck for 23s!
[kworker/u24:5:856593]
2021-08-23T21:20:08,044420 Modules linked in: ib_core nfsv3 nfs_acl
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache
xt_CHECKSUM ipt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat
nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nf_tables libcrc32c nfnetlink tun bridge stp llc sunrpc vfat fat intel_rapl=
_msr
intel_rapl_common snd_hda_codec_hdmi snd_sof_pci snd_sof_intel_byt
snd_sof_intel_ipc snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_xtensa_=
dsp
snd_sof_intel_hda snd_hda_codec_realtek snd_sof snd_hda_codec_generic
ledtrig_audio snd_soc_skl snd_soc_sst_ipc snd_soc_sst_dsp x86_pkg_temp_ther=
mal
intel_powerclamp snd_hda_ext_core coretemp snd_soc_acpi_intel_match
snd_soc_acpi kvm_intel snd_hda_intel snd_intel_dspcfg soundwire_intel
soundwire_generic_allocation iTCO_wdt mei_wdt iTCO_vendor_support
intel_wmi_thunderbolt wmi_bmof snd_soc_core kvm snd_compress soundwire_cade=
nce
soundwire_bus irqbypass snd_hda_codec snd_hda_core crct10dif_pclmul
2021-08-23T21:20:08,044438  crc32_pclmul snd_hwdep snd_seq ghash_clmulni_in=
tel
rapl snd_seq_device intel_cstate snd_pcm intel_uncore snd_timer snd pcspkr
soundcore i2c_i801 mei_me mei intel_pch_thermal ie31200_edac wmi acpi_pad
intel_pmc_core ip_tables ext4 mbcache jbd2 sd_mod t10_pi sg i915 cec intel_=
gtt
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm igb ahci
e1000e libahci libata dca i2c_algo_bit crc32c_intel video pinctrl_cannonlake
fuse
2021-08-23T21:20:08,044451 CPU: 2 PID: 856593 Comm: kworker/u24:5 Kdump: lo=
aded
Tainted: G        W    L   --------- -  - 4.18.0-305.10.2.el8_4.x86_64 #1
2021-08-23T21:20:08,044451 Hardware name: Puget Systems C246M-WU4/C246M-WU4=
-CF,
BIOS F4e 09/16/2020
2021-08-23T21:20:08,044466 Workqueue: rpciod rpc_async_schedule [sunrpc]
2021-08-23T21:20:08,044469 RIP:
0010:native_queued_spin_lock_slowpath+0x18f/0x1d0
2021-08-23T21:20:08,044469 Code: c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 05 48 =
63
c9 48 05 00 ac 02 00 48 03 04 cd a0 b8 b7 a7 48 89 10 8b 42 08 85 c0 75 09 =
f3
90 <8b> 42 08 85 c0 74 f7 48 8b 02 48 85 c0 0f 84 6e ff ff ff 48 89 c1
2021-08-23T21:20:08,044470 RSP: 0018:ffff9d9bc9ea7de8 EFLAGS: 00000246
ORIG_RAX: ffffffffffffff13
2021-08-23T21:20:08,044471 RAX: 0000000000000000 RBX: ffff8a8ac1679000 RCX:
0000000000000001
2021-08-23T21:20:08,044471 RDX: ffff8a8b7c2aac00 RSI: 00000000000c0000 RDI:
ffff8a8ac1679670
2021-08-23T21:20:08,044472 RBP: ffff8a8ac1679670 R08: 0000000000200000 R09:
0000000000000000
2021-08-23T21:20:08,044472 R10: ffff8a7c88217200 R11: 0000000000000001 R12:
ffff8a80d33fc7d0
2021-08-23T21:20:08,044472 R13: 0000000000000000 R14: ffffffffc0c52b20 R15:
ffffffffc0c44cf0
2021-08-23T21:20:08,044473 FS:  0000000000000000(0000)
GS:ffff8a8b7c280000(0000) knlGS:0000000000000000
2021-08-23T21:20:08,044473 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2021-08-23T21:20:08,044474 CR2: 000055a105562088 CR3: 0000000eb0610004 CR4:
00000000003706e0
2021-08-23T21:20:08,044474 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
2021-08-23T21:20:08,044475 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
2021-08-23T21:20:08,044475 Call Trace:
2021-08-23T21:20:08,044478  _raw_spin_lock+0x1c/0x20
2021-08-23T21:20:08,044487  xprt_prepare_transmit+0x4a/0xb0 [sunrpc]
2021-08-23T21:20:08,044495  call_transmit+0x2e/0x80 [sunrpc]
2021-08-23T21:20:08,044503  __rpc_execute+0x85/0x3c0 [sunrpc]
2021-08-23T21:20:08,044506  ? try_to_wake_up+0x1cd/0x540
2021-08-23T21:20:08,044514  rpc_async_schedule+0x29/0x40 [sunrpc]
2021-08-23T21:20:08,044515  process_one_work+0x1a7/0x360
2021-08-23T21:20:08,044516  worker_thread+0x30/0x390
2021-08-23T21:20:08,044517  ? create_worker+0x1a0/0x1a0
2021-08-23T21:20:08,044518  kthread+0x116/0x130
2021-08-23T21:20:08,044519  ? kthread_flush_work_fn+0x10/0x10
2021-08-23T21:20:08,044520  ret_from_fork+0x35/0x40
...
021-08-23T21:20:41,880955 nfs: server SERVER not responding, timed out
2021-08-23T21:20:41,881036 nfs: server SERVER not responding, timed out
2021-08-23T21:20:41,881039 nfs: server SERVER not responding, timed out
...


Earlier, mostly one-sided, discussion elsewhere ...

- NFS v3 - FreeBSD 12 server connection getting timed out on CentOS 8 clien=
t,
https://lists.freebsd.org/pipermail/freebsd-questions/2021-August/294634.ht=
ml

- centos-freebsd-nfs-issue, https://github.com/parv/centos8-nfs3-issue
  (dmesg from CentOS; tcpdumps from CentOS & FreeBSD)


- https://twitter.com/_parv/status/1429889151155916800

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-259163-227>