From owner-freebsd-net@FreeBSD.ORG Thu Oct 9 13:00:57 2003 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2269A16A4B3 for ; Thu, 9 Oct 2003 13:00:57 -0700 (PDT) Received: from isilon.com (isilon.com [65.101.129.58]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2B5F743FCB for ; Thu, 9 Oct 2003 13:00:55 -0700 (PDT) (envelope-from skye@isilon.com) Received: from skye by skye-2.isilon.com with local (Exim 4.14) id 1A7gqd-00069c-Mj for freebsd-net@freebsd.org; Thu, 09 Oct 2003 12:53:19 -0700 Date: Thu, 9 Oct 2003 12:53:19 -0700 From: Skye Poier To: freebsd-net@freebsd.org Message-ID: <20031009195319.GE929@isilon.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i Subject: Panic in NFS (tcp_output) on -current X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Oct 2003 20:00:57 -0000 Hi BSDers, I'm running with an older version of -current and saw a panic from the NFS server socket upcall which I'll describe in detail. I had a close look at the latest sources and there doesn't appear to be any changes that would have prevented this panic. Only ever seen this once in about a year, so its a pretty rare but fatal case. Not sure if it applies to the 4.x branch but anyway... Here's the stack dump: panic tcp_output tcp_usr_rcvd soreceive nfsrv_rcv sowakeup soisdisconnected tcp_close tcp_drop tcp_timer_keep softclock Here's what's happening: at the end of tcp_close (tcp_discardcb in -current) the code does this: inp->inp_ppcb = NULL; ... soisdisconnected(so); soisdisconnected does a sorwakeup which calls the nfsrv_rcv upcall which calls soreceive. At the end of soreceive we have this: if ((flags & MSG_PEEK) == 0) { ... if (pr->pr_flags & PR_WANTRCVD && so->so_pcb) (*pr->pr_usrreqs->pru_rcvd)(so, flags); Which calls tcp_usr_rcvd. When it gets the TCP protocol control block via intotcpcb(sotoinpcb(so)) it ends up with a NULL pointer because of the assignment to inp_ppcb before calling soisdisconnected above. Voila, panic in tcp_output on NULL deref. This would also have happened if any TCP socket upcall had tried to soreceive MSG_OOB under these conditions. My question is this, why is the TCP pcb disconnected from the inpcb before calling soisdisconnected??? I don't see any benefit to doing this half-way teardown BEFORE calling soisdisconnected - the only possible uses (that I can see) of that variable would result in a panic in every case. And right after the soisdisconnected, the pcb is destroyed. The only thing that is ever checked is so->so_pcb which is still valid. Thoughts?? Seems like either the inp->inp_ppcb = NULL (and t_inpcb = NULL in -current) should be moved AFTER the soisdisconnected, or the socket should be torn down further (invalidate so->so_pcb?) before calling soisdisconnected Thanks! Skye