Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Jan 2025 22:36:07 -0800
From:      Gleb Smirnoff <glebius@freebsd.org>
To:        Rick Macklem <rick.macklem@gmail.com>
Cc:        current@freebsd.org, rmacklem@freebsd.org
Subject:   Re: HEADS UP: NFS changes coming into CURRENT early February
Message-ID:  <Z5nMV1PFg-AFqfLr@cell.glebi.us>
In-Reply-To: <Z5m5FRtPYoggCzaO@cell.glebi.us>
References:  <Z5CP2WBdW_vbqzil@cell.glebi.us> <CAM5tNy6aeoMC2WjmHppvJiXqUOpn%2BzfvDBTf-_Vbcgcmy7DSTg@mail.gmail.com> <Z5m5FRtPYoggCzaO@cell.glebi.us>

index | next in thread | previous in thread | raw e-mail

On Tue, Jan 28, 2025 at 09:14:00PM -0800, Gleb Smirnoff wrote:
T> Second, with the patch the M_RPC leak count for me is 2. And I found that these
T> two items are basically is a clnt_vc that belongs to a closed connection:
T> 
T> fffff80029614a80 tcp4       0      0 10.6.6.9.772       10.6.6.9.2049      CLOSED     
T> 
T> There is no connection peer connection, as the server received a timeout trying
T> to send. But rpc.tlsclntd doesn't try to send anything on the socket, it just
T> keeps it select(2) fd set and doesn't garbage collect.
T> 
T> So it is a bigger resource leak than just two pieces of M_RPC. I don't think
T> this is related to my changes.

Here is what is going on here:

- TCP connection is teared down and tcp_close() calls soisdisconnected()
- soisdisconnected() calls clnt_vc_soupcall() to notify of error condition
- clnt_vc_soupcall() tries soreceive() and gets so->so_error.
- clnt_vc_soupcall() sets the client to error state. It doesn't wakeup
  anything cause there were no running RPC requests. It can't report back
  to clnt_rc that connection is dead. It doesn't mark itself
  for the clnt_vc_dotlsupcall() processing.

So we end up with:

(kgdb) p $tp->t_state
$25 = 0	/* TCPS_CLOSED */
(kgdb) p/x $tp->t_inpcb.inp_flags & 0x04000000	/* INP_DROPPED */
$27 = 0x4000000
(kgdb) p/x $tp->t_inpcb.inp_socket->so_state
$28 = 0x2000	/* SS_ISDISCONNECTED */
(kgdb) p/x $tp->t_inpcb.inp_socket->so_count
$35 = 0x2
(kgdb) p/x $ct->ct_rcvstate 
$29 = 0x41	/* RPCRCVSTATE_UPCALLTHREAD | RPCRCVSTATE_NORMAL */
(kgdb)  p $ct->ct_error
$30 = {re_status = RPC_CANTRECV, ru = {RE_errno = 13, RE_why = RPCSEC_GSS_CREDPROBLEM, RE_vers = {low = 13, high = 0}, RE_lb = {s1 = 13, s2 = 0}}}
(kgdb) p $ct->ct_pending
$31 = {tqh_first = 0x0, tqh_last = 0xfffff80002838ea8}

Note: In my case so->so_error was EACCESS, cause I used ipfw(4) rule to tear down
connection, for normal TCP timeout should be ETIMEDOUT or ECONNRESET if remote
has reset.  That's why $ct->ct_error.ru.RE_errno == 13.

So we need some mechanism for clnt_vc_soupcall() to report to upper clnt_rc
that we are dead and ready to be garbage collected via CLNT_CLOSE() and then
CLNT_RELEASE().

Once clnt_vc_destroy() is called the daemon will be notified that the TLS
socket can be closed by the daemon, bringing so_count to 1 and then final
sorele() will bring it to 0 and free.

-- 
Gleb Smirnoff


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Z5nMV1PFg-AFqfLr>