From nobody Fri Mar 1 08:00:20 2024 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TmL9q1mcHz5C8NV for ; Fri, 1 Mar 2024 08:00:31 +0000 (UTC) (envelope-from SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl) Received: from smtp-relay-int-backup.realworks.nl (smtp-relay-int-backup.realworks.nl [87.255.56.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TmL9n4q1Tz4pb9; Fri, 1 Mar 2024 08:00:29 +0000 (UTC) (envelope-from SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=klop.ws header.s=rw2 header.b=pSRAALN+; dmarc=pass (policy=quarantine) header.from=klop.ws; spf=pass (mx1.freebsd.org: domain of "SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl" designates 87.255.56.188 as permitted sender) smtp.mailfrom="SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl" Received: from smtp-relay-int.realworks.nl (rwvirtual375.colo.realworks.nl [10.0.10.75]) by mailrelayint1.colo2.realworks.nl (Postfix) with ESMTP id 4TmL9d2fctz1Zg; Fri, 1 Mar 2024 09:00:21 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=klop.ws; s=rw2; t=1709280021; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to; bh=6YLdcB4oAa81IK4ne6Yd37g++Gdb9f2xIaikl8IJM8I=; b=pSRAALN+rS44Dh+idXsYqS1/5g1jKlHIYwThYs8d5Rp1DHFUgD8SisEXwlTZZHNNVgv7XS 7GKQN+XWhT7ZF3drNO9gWE9NqitWlHz0zRNy4j1SoZZ1di+HD0WYZCdIGF/FbRntQJ/lbo 4WmMCCTz2JmaughjVJsVQ5d/k6Ui4PbB9NzerEHQKJ3yElMcEBUHCOZhLAAopHsOKbjAys 3kKRklhn8smBwCg7GtlKXiYMNmys1ZvvLTlhta/rui6mm97V/z9/dk9XMJQzLlSFLqjx8I rviIFsl3Phe0V6FxjJJCZ0oXLABgTK8nGSJ1YvuwJ9NLDVpbhVLBBL0xw1dT8Q== Received: from rwvirtual375.colo.realworks.nl (localhost [127.0.0.1]) by rwvirtual375.colo.realworks.nl (Postfix) with ESMTP id 036DD401A9; Fri, 1 Mar 2024 09:00:20 +0100 (CET) Date: Fri, 1 Mar 2024 09:00:20 +0100 (CET) From: Ronald Klop To: Rick Macklem Cc: Garrett Wollman , stable@freebsd.org, rmacklem@freebsd.org Message-ID: <1020651467.1592.1709280020993@localhost> In-Reply-To: Subject: Re: 13-stable NFS server hang List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1591_1207250147.1709280020983" X-Mailer: Realworks (692.38) X-Originating-Host: from (localhost [127.0.0.1]) by rwvirtual375 [10.0.10.75] with HTTP; Fri, 01 Mar 2024 09:00:20 +0100 Importance: Normal X-Priority: 3 (Normal) X-Spamd-Bar: - X-Spamd-Result: default: False [-1.70 / 15.00]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; DMARC_POLICY_ALLOW(-0.50)[klop.ws,quarantine]; MID_RHS_NOT_FQDN(0.50)[]; FORGED_SENDER(0.30)[ronald-lists@klop.ws,SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl]; R_SPF_ALLOW(-0.20)[+ip4:87.255.56.128/26]; R_DKIM_ALLOW(-0.20)[klop.ws:s=rw2]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; ARC_NA(0.00)[]; ASN(0.00)[asn:38930, ipnet:87.255.32.0/19, country:NL]; TO_DN_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; HAS_X_PRIO_THREE(0.00)[3]; TAGGED_RCPT(0.00)[]; FROM_NEQ_ENVFROM(0.00)[ronald-lists@klop.ws,SRS0=zTW4=KH=klop.ws=ronald-lists@realworks.nl]; RCVD_TLS_LAST(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MLMMJ_DEST(0.00)[stable@freebsd.org]; FREEMAIL_TO(0.00)[gmail.com]; RCPT_COUNT_THREE(0.00)[4]; DKIM_TRACE(0.00)[klop.ws:+] X-Rspamd-Queue-Id: 4TmL9n4q1Tz4pb9 ------=_Part_1591_1207250147.1709280020983 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Interesting read. Would it be possible to separate locking for admin actions like a client mounting an fs from traffic flowing for file operations? Like ongoing file operations could have a read only view/copy of the mount table. Only new operations will have to wait. But the mount never needs to wait for ongoing operations before locking the structure. Just a thought in the morning Regards, Ronald. Van: Rick Macklem Datum: 1 maart 2024 00:31 Aan: Garrett Wollman CC: stable@freebsd.org, rmacklem@freebsd.org Onderwerp: Re: 13-stable NFS server hang > > > On Wed, Feb 28, 2024 at 4:04PM Rick Macklem wrote: > > > > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman wrote: > > > > > > Hi, all, > > > > > > We've had some complaints of NFS hanging at unpredictable intervals. > > > Our NFS servers are running a 13-stable from last December, and > > > tonight I sat in front of the monitor watching `nfsstat -dW`. I was > > > able to clearly see that there were periods when NFS activity would > > > drop *instantly* from 30,000 ops/s to flat zero, which would last > > > for about 25 seconds before resuming exactly as it was before. > > > > > > I wrote a little awk script to watch for this happening and run > > > `procstat -k` on the nfsd process, and I saw that all but two of the > > > service threads were idle. The three nfsd threads that had non-idle > > > kstacks were: > > > > > > PID TID COMM TDNAME KSTACK > > > 997 108481 nfsd nfsd: master mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall fast_syscall_common > > > 997 960918 nfsd nfsd: service mi_switch sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline > > > 997 962232 nfsd nfsd: service mi_switch _cv_wait txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline > > > > > > I'm suspicious of two things: first, the copy_file_range RPC; second, > > > the "master" nfsd thread is actually servicing an RPC which requires > > > obtaining a lock. The "master" getting stuck while performing client > > > RPCs is, I believe, the reason NFS service grinds to a halt when a > > > client tries to write into a near-full filesystem, so this problem > > > would be more evidence that the dispatching function should not be > > > mixed with actual operations. I don't know what the clients are > > > doing, but is it possible that nfsrvd_copy_file_range is holding a > > > lock that is needed by one or both of the other two threads? > > > > > > Near-term I could change nfsrvd_copy_file_range to just > > > unconditionally return NFSERR_NOTSUP and force the clients to fall > > > back, but I figured I would ask if anyone else has seen this. > > I have attached a little patch that should limit the server's Copy size > > to vfs.nfsd.maxcopyrange (default of 10Mbytes). > > Hopefully this makes sure that the Copy does not take too long. > > > > You could try this instead of disabling Copy. It would be nice to know if > > this is suffciient? (If not, I'll probably add a sysctl to disable Copy.) > I did a quick test without/with this patch,where I copied a 1Gbyte file. > > Without this patch, the Copy RPCs mostly replied in just under 1sec > (which is what the flag requests), but took over 4sec for one of the Copy > operations. This implies that one Read/Write of 1Mbyte on the server > took over 3 seconds. > I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbytes > each and it was one of these 100Mbyte Copy operations that took over 4sec. > > With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes > each and they took a consistent 0.25-0.3sec to reply. (This is a test of a local > mount on an old laptop, so nowhere near a server hardware config.) > > So, the patch might be sufficient? > > It would be nice to avoid disabling Copy, since it avoids reading the data > into the client and then writing it back to the server. > > I will probably commit both patches (10Mbyte clip of Copy size and > disabling Copy) to main soon, since I cannot say if clipping the size > of the Copy will always be sufficient. > > Pleas let us know how trying these patches goes, rick > > > > > rick > > > > > > > > -GAWollman > > > > > > > > > > > ------=_Part_1591_1207250147.1709280020983 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: quoted-printable Interesting read. 

 W= ould it be possible to separate locking for admin actions like a client mou= nting an fs from traffic flowing for file operations?

<= div>Like ongoing file operations could have a read only view/copy of the mo= unt table. Only new operations will have to wait.
But the mount n= ever needs to wait for ongoing operations before locking the structure.&nbs= p;

Just a thought in the morning

Regards,
Ronald.

Van: Rick Macklem <rick.macklem@gmail.com>
Datum: 1 maart 2024 00:31
Aan: Garrett Wollman <wollman@b= imajority.org>
CC: stable@freebsd.org, rmacklem@free= bsd.org
Onderwerp: Re: 13-stable NFS server hang

On Wed, Feb 28, 2024 at 4:04PM Rick Macklem wrote:
>
> On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman wrote:
> >
> > Hi, all,
> >
> > We've had some complaints of NFS hanging at unpredictable interva= ls.
> > Our NFS servers are running a 13-stable from last December, and > > tonight I sat in front of the monitor watching `nfsstat -dW`. &nb= sp;I was
> > able to clearly see that there were periods when NFS activity wou= ld
> > drop *instantly* from 30,000 ops/s to flat zero, which would last=
> > for about 25 seconds before resuming exactly as it was before. > >
> > I wrote a little awk script to watch for this happening and run > > `procstat -k` on the nfsd process, and I saw that all but two of = the
> > service threads were idle.  The three nfsd threads that had = non-idle
> > kstacks were:
> >
> >   PID    TID COMM    &nbs= p;           TDNAME =             &nb= sp;KSTACK
> >   997 108481 nfsd       &= nbsp;        nfsd: master  &nb= sp;     mi_switch sleepq_timedwait _sleep nfsv4_lo= ck nfsrvd_dorpc nfssvc_program svc_run_internal svc_run nfsrvd_nfsd nfssvc_= nfsd sys_nfssvc amd64_syscall fast_syscall_common
> >   997 960918 nfsd       &= nbsp;        nfsd: service  &n= bsp;    mi_switch sleepq_timedwait _sleep nfsv4_lock nf= srv_setclient nfsrvd_exchangeid nfsrvd_dorpc nfssvc_program svc_run_interna= l svc_thread_start fork_exit fork_trampoline
> >   997 962232 nfsd       &= nbsp;        nfsd: service  &n= bsp;    mi_switch _cv_wait txg_wait_synced_impl txg_wai= t_synced dmu_offset_next zfs_holey zfs_freebsd_ioctl vn_generic_copy_file_r= ange vop_stdcopy_file_range VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_c= opy_file_range nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_star= t fork_exit fork_trampoline
> >
> > I'm suspicious of two things: first, the copy_file_range RPC; sec= ond,
> > the "master" nfsd thread is actually servicing an RPC which requi= res
> > obtaining a lock.  The "master" getting stuck while performi= ng client
> > RPCs is, I believe, the reason NFS service grinds to a halt when = a
> > client tries to write into a near-full filesystem, so this proble= m
> > would be more evidence that the dispatching function should not b= e
> > mixed with actual operations.  I don't know what the clients= are
> > doing, but is it possible that nfsrvd_copy_file_range is holding = a
> > lock that is needed by one or both of the other two threads?
> >
> > Near-term I could change nfsrvd_copy_file_range to just
> > unconditionally return NFSERR_NOTSUP and force the clients to fal= l
> > back, but I figured I would ask if anyone else has seen this.
> I have attached a little patch that should limit the server's Copy siz= e
> to vfs.nfsd.maxcopyrange (default of 10Mbytes).
> Hopefully this makes sure that the Copy does not take too long.
>
> You could try this instead of disabling Copy. It would be nice to know= if
> this is suffciient? (If not, I'll probably add a sysctl to disable Cop= y.)
I did a quick test without/with this patch,where I copied a 1Gbyte file.
Without this patch, the Copy RPCs mostly replied in just under 1sec
(which is what the flag requests), but took over 4sec for one of the Copy operations. This implies that one Read/Write of 1Mbyte on the server
took over 3 seconds.
I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbyt= es
each and it was one of these 100Mbyte Copy operations that took over 4sec.<= br>
With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes each and they took a consistent 0.25-0.3sec to reply. (This is a test of a = local
mount on an old laptop, so nowhere near a server hardware config.)

So, the patch might be sufficient?

It would be nice to avoid disabling Copy, since it avoids reading the data<= br> into the client and then writing it back to the server.

I will probably commit both patches (10Mbyte clip of Copy size and
disabling Copy) to main soon, since I cannot say if clipping the size
of the Copy will always be sufficient.

Pleas let us know how trying these patches goes, rick

>
> rick
>
> >
> > -GAWollman
> >
> >





------=_Part_1591_1207250147.1709280020983--