From owner-freebsd-fs@freebsd.org Sun Dec 13 08:09:24 2020 Return-Path: Delivered-To: freebsd-fs@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id EFE114B11C8 for ; Sun, 13 Dec 2020 08:09:24 +0000 (UTC) (envelope-from jdavidlists@gmail.com) Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Ctxxv6g9Zz3JhD for ; Sun, 13 Dec 2020 08:09:23 +0000 (UTC) (envelope-from jdavidlists@gmail.com) Received: by mail-lf1-x134.google.com with SMTP id m12so22371343lfo.7 for ; Sun, 13 Dec 2020 00:09:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=e/2vazV1NbBZkCghA8bGmpU6064DFZhfQeICIS6fVoE=; b=baOdDAuJ3ArGzzQTvZS1j3UcXaN8FKBFT5BAqd9j8W3cIXNN8ezxEiRGdegTS88cqa i8OibbaaHoZbWBjWq7eADd9+MGSiDtTGQia2wrzEn6QooxxdPqlMAsSdiMh2khhViFtn gTbuh/gTfd8E00EBW7AgJBHOEeTRnV8S24iIhabDuQ6VSeoZW5+xv8vnbuIZwGf6V+ZU 9Ai7yBJU1RSt00DSPO2ZAcfbI8rsaFjRzSRyq1R4WpfxyrMPqmcmUMP7bK4CqYQc4oml 8y7+OxC/6Zd13Y+BJtkCKzM/rvTrGC1uO4YboFIpDfNJlqIcv95zgR1rPfKDCSexH7OU v8rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=e/2vazV1NbBZkCghA8bGmpU6064DFZhfQeICIS6fVoE=; b=UfGkgDmsgqjafA9Pq4tJbPekfhxlaC6K8ox2juFwB2yyMY096tfiI7V6+vTn6Zzi8x Fr2c1715pdwA89FK1WhnBUUA/oDxtPCS3kqeoJoE2VSzzb12gfWXabYvehPhaPoGywcb 9SmGM9wZUpJ8oMns4NajkU3yd603zYnoKqjHsbCv6Cf4fuQpnUx5gRmj4W8TU9FaVXgA 7rqlNEkL5zwg6ks0ql9+LuuA2FphB/MMo1m4cRdV+M4FXwCbGzK+Bti4BYaYW3OfbLpH 9Ngf19MHBq45QLgXR225W1B4UhHT7v7yU02GhQCAp5hvtda2VNNs7ZQ1Sy8c3CF8NOWJ wE8w== X-Gm-Message-State: AOAM532BPVFaaPRhDRJ0YBh5P5tgk+rRbV5sBIh7dVn9GypWyZYAndBp Q4s4d2/4/66DGWU6ZdVNy61gQQGmLchD/azXsoI= X-Google-Smtp-Source: ABdhPJw6YJ/I8GJ/3gX5/ahm8KZ97aoqCG97x0hOHqJaG6bt7lcA5sUYxpCtNBxEmsyKO+VPt0fkUPFmGr94JA5H/KY= X-Received: by 2002:a05:6512:1095:: with SMTP id j21mr7650261lfg.309.1607846961463; Sun, 13 Dec 2020 00:09:21 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: J David Date: Sun, 13 Dec 2020 03:09:10 -0500 Message-ID: Subject: Re: Major issues with nfsv4 To: Konstantin Belousov Cc: Rick Macklem , "freebsd-fs@freebsd.org" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4Ctxxv6g9Zz3JhD X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=baOdDAuJ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of jdavidlists@gmail.com designates 2a00:1450:4864:20::134 as permitted sender) smtp.mailfrom=jdavidlists@gmail.com X-Spamd-Result: default: False [-2.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[2a00:1450:4864:20::134:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; TAGGED_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; SPAMHAUS_ZRD(0.00)[2a00:1450:4864:20::134:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(1.00)[1.000]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::134:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-fs] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Dec 2020 08:09:25 -0000 The output of "procstat -kk -a" is extremely large. So I found what looked like four processes that were having problems, ran procstat -kk in a loop for all four for a couple of minutes, appending the output to a file, then did sort | uniq on the file to find common traces. 353 samples were collected. 11 "" samples had no kernel call stack and were discarded. Of the 342 stack samples, here's a frequency analysis of the entries present in the trace: 451 VOP_LOOKUP_APV 310 kern_statat 309 fast_syscall_common 303 amd64_syscall 234 namei 233 lookup 226 null_lookup 225 freebsd11_stat 220 nfs_lookup 189 __mtx_lock_sleep 156 nfscl_nodeleg 155 mi_switch 136 vputx 134 VOP_INACTIVE_APV 134 vinactive 83 sys_fstatat 68 nfsrpc_close 68 ncl_inactive 67 turnstile_wait 66 vrecycle 66 VOP_RECLAIM_APV 66 vgonel 66 null_reclaim 66 null_inactive 55 VOP_GETATTR_APV 55 nfscl_request 55 newnfs_request 55 clnt_reconnect_call 54 clnt_vc_call 53 sleepq_timedwait 53 _sleep 41 nfsrpc_lookup 30 nfs_getattr 26 nfscl_mustflush 26 ncl_getattrcache 25 null_bypass 24 vn_stat 24 null_getattr 19 ast 14 VOP_LOCK1_APV 13 VOP_ACCESS_APV 13 vn_dir_check_exec 13 nfsrpc_accessrpc 13 nfs34_access_otw 13 nfs_access 13 critical_exit_preempt 12 Xipi_intr_bitmap_handler 12 ipi_bitmap_handler 12 doreti_ast 11 thread_lock_flags_ 9 sigdeferstop_impl 8 nfscl_doclose 6 _vn_lock 5 null_lock 3 sleepq_wait 3 sleeplk 3 nfscl_deleggetmodtime 3 nfs_lock 2 VOP_READLINK_APV 2 VOP_ISLOCKED_APV 2 sigallowstop_impl 2 nfscl_getcl 2 lockmgr_xlock_hard 1 Xtimerint 1 vtnet_txq_mq_start_locked 1 vtnet_txq_mq_start 1 vget 1 tcp_usr_send 1 tcp_output 1 sosend_generic 1 sosend 1 null_nodeget 1 nfsrpc_getattr 1 nfscl_clientrelease 1 ncl_bioread 1 lockmgr_slock_hard 1 ip_output 1 ether_output_frame 1 ether_output 1 cache_lookup (VOP_LOOKUP_APV appears twice on many lines.) The most common trace (91 times, or over 1/4 of all observed) is: __mtx_lock_sleep+0xf8 nfscl_nodeleg+0x207 nfs_lookup+0x314 VOP_LOOKUP_APV+0x75 null_lookup+0x98 VOP_LOOKUP_APV+0x75 lookup+0x451 namei+0x414 kern_statat+0x72 freebsd11_stat+0x30 amd64_syscall+0x387 fast_syscall_common+0xf8 This trace appears roughly uniformly (28/38/34) in 3 of the 4 processes. The full set of traces (sorted by uniq -c) is here: https://pastebin.com/HUqkeMri (This message is already long enough!) The characters stripped off the front of each line are consistently: (pid) (uid) python3 - It happens these were all Python-based jobs. Python seems predisposed to trigger this, but non-Python jobs trigger it as well. Heavy use of stat() does seem to be a common element regardless of job type. Here's the output of "nfsstat -E -c" 60 seconds after running it with -z: Client Info: RPC Counts: Getattr Setattr Lookup Readlink Read Write 1667 111 6376 0 42 153 Create Remove Rename Link Symlink Mkdir 111 7 14 0 0 0 Rmdir Readdir RdirPlus Access Mknod Fsstat 0 0 0 2620 0 160 FSinfo pathConf Commit SetClId SetClIdCf Lock 0 0 113 0 0 48 LockT LockU Open OpenCfr 0 48 320 0 OpenDownGr Close 0 402 RelLckOwn FreeStateID PutRootFH DelegRet GetAcl SetAcl 0 3 0 0 0 0 ExchangeId CreateSess DestroySess DestroyClId LayoutGet GetDevInfo 0 0 0 0 0 0 LayoutCommit LayoutReturn ReclaimCompl ReadDataS WriteDataS CommitDataS 0 0 0 0 0 0 OpenLayout CreateLayout 0 0 OpenOwner Opens LockOwner Locks Delegs LocalOwn 21175 130439 30 6 0 0 LocalOpen LocalLown LocalLock 0 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 0 12247 Cache Info: Attr Hits Attr Misses Lkup Hits Lkup Misses 1110054 858 1002829 6361 BioR Hits BioR Misses BioW Hits BioW Misses 2000 54 292 153 BioRL Hits BioRL Misses BioD Hits BioD Misses 6911 0 208 0 DirE Hits DirE Misses 104 0 This does reflect the whole machine, not just those four processes. Finally, when I attempted to kill those four processes with ktrace running on all of them, the system panicked: Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff8272d28f stack pointer = 0x28:0xfffffe008a5f24c0 frame pointer = 0x28:0xfffffe008a5f25f0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5627 (node) trap number = 12 panic: page fault cpuid = 2 time = 1607845622 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe008a5f2180 vpanic() at vpanic+0x17b/frame 0xfffffe008a5f21d0 panic() at panic+0x43/frame 0xfffffe008a5f2230 trap_fatal() at trap_fatal+0x391/frame 0xfffffe008a5f2290 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe008a5f22e0 trap() at trap+0x286/frame 0xfffffe008a5f23f0 calltrap() at calltrap+0x8/frame 0xfffffe008a5f23f0 --- trap 0xc, rip = 0xffffffff8272d28f, rsp = 0xfffffe008a5f24c0, rbp = 0xfffffe008a5f25f0 --- null_bypass() at null_bypass+0xaf/frame 0xfffffe008a5f25f0 VOP_ADVLOCK_APV() at VOP_ADVLOCK_APV+0x80/frame 0xfffffe008a5f2620 closef() at closef+0x8f/frame 0xfffffe008a5f26b0 fdescfree_fds() at fdescfree_fds+0x3c/frame 0xfffffe008a5f2700 fdescfree() at fdescfree+0x466/frame 0xfffffe008a5f27c0 exit1() at exit1+0x488/frame 0xfffffe008a5f2820 sigexit() at sigexit+0x159/frame 0xfffffe008a5f2b00 postsig() at postsig+0x2fd/frame 0xfffffe008a5f2bc0 ast() at ast+0x317/frame 0xfffffe008a5f2bf0 doreti_ast() at doreti_ast+0x1f/frame 0x7fffffff67a0 The fsck ate the ktrace.out file on reboot. This system does not have kernel debug symbols. If there's a way to try to figure out what null_bypass+0xaf corresponds to using the machine that built this kernel, which has the kernel, objects, debug symbols, and source, please let me know. It's built from releng/12.2 r368515. Thanks!