From owner-freebsd-net@freebsd.org Wed Mar 17 21:46:01 2021 Return-Path: Delivered-To: freebsd-net@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 596A3579BC1 for ; Wed, 17 Mar 2021 21:46:01 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4F13cl640Zz3N99 for ; Wed, 17 Mar 2021 21:45:59 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oi1-f182.google.com with SMTP id w195so493897oif.11 for ; Wed, 17 Mar 2021 14:45:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bTrD6c+z+tEBq8e1nKdwFXKQ8pGBWcwQGcCT+shZ53c=; b=oAa3HEClU1xSZujxd1nXwEyVM69aiUP49CpJD12ITYZFYjCRTmb3kYTEj9B/Jecif0 ocQ4dmlKIeN1OQ4vTHT6v5XU8S+YdlRCKAMcPIJrspylBRrGHYXB2griBW0drbA4Ek1L d9fOR57dpMp2mBJtgTbNoVgG3SBOYl4CHJd7Nf6zu4nKVowfX5qTs00FG78ky7y+yfxk NUP6W3oZWDBJBavkaSF+d7FRYKeDwnKZOtPVxRD/nIqQcILKm2ZEOqbJIzfIY1xK/3wm 1so0/la8FE4Fzz1j4a2DnF+EguwRHayFoaK6cwQdyfNHA1g+E3PCyURty/W3rQKYQkl/ wPPQ== X-Gm-Message-State: AOAM530BIWW/CH9kHWGdTFx+Isx29eJi1IANgLndhwI0IkVn8j78J6cl fe1QKqsji958lnBr85LbbsBUD3Sf9A4/y5qgBTk= X-Google-Smtp-Source: ABdhPJzBHDfb7YJ67wYCzzZ+bvh3Prbq8v01rqN75hmwaBB0ORl3eB92Elwr6ijOIPPrQY3+4KXW3HTAlqU+CiyKPns= X-Received: by 2002:aca:4f0b:: with SMTP id d11mr692797oib.73.1616017558272; Wed, 17 Mar 2021 14:45:58 -0700 (PDT) MIME-Version: 1.0 References: <3750001D-3F1C-4D9A-A9D9-98BCA6CA65A4@tildenparkcapital.com> <33693DE3-7FF8-4FAB-9A75-75576B88A566@tildenparkcapital.com> In-Reply-To: From: Alan Somers Date: Wed, 17 Mar 2021 15:45:47 -0600 Message-ID: Subject: Re: NFS Mount Hangs To: Rick Macklem Cc: Jason Breitman , "freebsd-net@freebsd.org" X-Rspamd-Queue-Id: 4F13cl640Zz3N99 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.167.182 as permitted sender) smtp.mailfrom=asomers@gmail.com X-Spamd-Result: default: False [-3.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RBL_DBL_DONT_QUERY_IPS(0.00)[209.85.167.182:from]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-0.999]; FREEFALL_USER(0.00)[asomers]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-net@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; SPAMHAUS_ZRD(0.00)[209.85.167.182:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[209.85.167.182:from]; R_DKIM_NA(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[209.85.167.182:from]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-net] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Mar 2021 21:46:01 -0000 On Wed, Mar 17, 2021 at 3:37 PM Rick Macklem wrote: > Jason Breitman wrote: > >Please review the details below and let me know if there is a setting > that I should >apply to my FreeBSD NFS Server or if there is a bug fix that > I can apply to resolve my >issue. > >I shared this information with the linux-nfs mailing list and they > believe the issue is >on the server side. > I actually lurk there and saw your post. I'll admit I smiled when Trond > argued > that a hung Linux system is the result of a server failing to send a > fin/ack for > a closing TCP connection. But, here's a few comments.. > > >Issue > >NFSv4 mounts periodically hang on the NFS Client. > > > >During this time, it is possible to manually mount from another NFS > Server on the >NFS Client having issues. > >Also, other NFS Clients are successfully mounting from the NFS Server in > question. > >Rebooting the NFS Client appears to be the only solution. > > > >Environment > >NFS Server > >OS: FreeBSD 12.1-RELEASE-p5 > > > >NFS Client > >OS: Debian Buster 10.8 > >Kernel: 4.19.171-2 > >Protocol: NFSv4 with Kerberos Security > >Mount Options: nfs-server.domain.com:/data /mnt/data nfs4 > >lookupcache=pos,noresvport,sec=krb5,hard,rsize=1048576,wsize=1048576 00 > The maximum I/O size supported by FreeBSD is 128K. > Is the 128K limit related to MAXPHYS? If so, it should be greater in 13.0. > The client should acquire the attributes that indicate that and set > rsize/wsize > to that. "# nfsstat -m" on the client should show you what the client > is actually using. If it is larger than 128K, set both rsize and wsize to > 128K. > > >Output from the NFS Client when the issue occurs > ># netstat -an | grep NFS.Server.IP.X > >tcp 0 0 NFS.Client.IP.X:46896 NFS.Server.IP.X:2049 > FIN_WAIT2 > I'm no TCP guy. Hopefully others might know why the client would be > stuck in FIN_WAIT2 (I vaguely recall this means it is waiting for a > fin/ack, > but could be wrong?) > > ># cat /sys/kernel/debug/sunrpc/rpc_xprt/*/info > >netid: tcp > >addr: NFS.Server.IP.X > >port: 2049 > >state: 0x51 > > > >syslog > >Mar 4 10:29:27 hostname kernel: [437414.131978] -pid- flgs status > -client- --rqstp- ->timeout ---ops-- > >Mar 4 10:29:27 hostname kernel: [437414.133158] 57419 40a1 0 > 9b723c73 >143cfadf 30000 4ca953b5 nfsv4 OPEN_NOATTR > a:call_connect_status [sunrpc] >q:xprt_pending > I don't know what OPEN_NOATTR means, but I assume it is some variant > of NFSv4 Open operation. > [stuff snipped] > >Mar 4 10:29:30 hostname kernel: [437417.110517] RPC: 57419 > xprt_connect_status: >connect attempt timed out > >Mar 4 10:29:30 hostname kernel: [437417.112172] RPC: 57419 > call_connect_status > >(status -110) > I have no idea what status -110 means? > >Mar 4 10:29:30 hostname kernel: [437417.113337] RPC: 57419 call_timeout > (major) > >Mar 4 10:29:30 hostname kernel: [437417.114385] RPC: 57419 call_bind > (status 0) > >Mar 4 10:29:30 hostname kernel: [437417.115402] RPC: 57419 call_connect > xprt >00000000e061831b is not connected > >Mar 4 10:29:30 hostname kernel: [437417.116547] RPC: 57419 xprt_connect > xprt >00000000e061831b is not connected > >Mar 4 10:30:31 hostname kernel: [437478.551090] RPC: 57419 > xprt_connect_status: >connect attempt timed out > >Mar 4 10:30:31 hostname kernel: [437478.552396] RPC: 57419 > call_connect_status >(status -110) > >Mar 4 10:30:31 hostname kernel: [437478.553417] RPC: 57419 call_timeout > (minor) > >Mar 4 10:30:31 hostname kernel: [437478.554327] RPC: 57419 call_bind > (status 0) > >Mar 4 10:30:31 hostname kernel: [437478.555220] RPC: 57419 call_connect > xprt >00000000e061831b is not connected > >Mar 4 10:30:31 hostname kernel: [437478.556254] RPC: 57419 xprt_connect > xprt >00000000e061831b is not connected > Is it possible that the client is trying to (re)connect using the same > client port#? > I would normally expect the client to create a new TCP connection using a > different client port# and then retry the outstanding RPCs. > --> Capturing packets when this happens would show us what is going on. > > If there is a problem on the FreeBSD end, it is most likely a broken > network device driver. > --> Try disabling TSO , LRO. > --> Try a different driver for the net hardware on the server. > --> Try a different net chip on the server. > If you can capture packets when (not after) the hang > occurs, then you can look at them in wireshark and see > what is actually happening. (Ideally on both client and > server, to check that your network hasn't dropped anything.) > --> I know, if the hangs aren't easily reproducible, this isn't > easily done. > --> Try a newer Linux kernel and see if the problem persists. > The Linux folk will get more interested if you can reproduce > the problem on 5.12. (Recent bakeathon testing of the 5.12 > kernel against the FreeBSD server did not find any issues.) > > Hopefully the network folk have some insight w.r.t. why > the TCP connection is sitting in FIN_WAIT2. > > rick > > > > Jason Breitman > > > > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >