From owner-freebsd-hackers@FreeBSD.ORG  Fri Mar  7 19:08:01 2008
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8BF2A1065677
	for <freebsd-hackers@freebsd.org>; Fri,  7 Mar 2008 19:08:01 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
	by mx1.freebsd.org (Postfix) with ESMTP id E829B8FC17
	for <freebsd-hackers@freebsd.org>; Fri,  7 Mar 2008 19:08:00 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: (qmail 1779 invoked from network); 7 Mar 2008 17:54:29 -0000
Received: from dotat.atdotat.at (HELO [62.48.0.47]) ([62.48.0.47])
	(envelope-sender <andre@freebsd.org>)
	by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
	for <rwatson@FreeBSD.org>; 7 Mar 2008 17:54:29 -0000
Message-ID: <47D18C4D.20309@freebsd.org>
Date: Fri, 07 Mar 2008 19:41:17 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.8b) Gecko/20050217
MIME-Version: 1.0
To: Robert Watson <rwatson@FreeBSD.org>
References: <47D07CC6.5060007@FreeBSD.org>
	<20080307091547.M23519@fledge.watson.org>
In-Reply-To: <20080307091547.M23519@fledge.watson.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-hackers@freebsd.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: soclose() & so->so_upcall() = race?
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Mar 2008 19:08:01 -0000

Robert Watson wrote:
> 
> On Fri, 7 Mar 2008, Alexander Motin wrote:
> 
>> As I can see so_upcall() callback is called with SOCKBUF_MTX unlocked. 
>> It means that SB_UPCALL flag can be removed during call and socket can 
>> be closed and deallocated with soclose() while callback is running. Am 
>> I right or I have missed something? How in that situation socket 
>> pointer protected from being used after free?
> 
> 
> There are known problems with so_upcall and locking, including this 
> one.  The other problems include:
> 
> - The locking condition on entering the upcall depends on the invocation 
> point
>   and is inconsistent.
> 
> - The protection of the upcall field and flag are inconsistent.
> 
> - Consumers of so_upcall, such as socket accept filters, don't properly
>   respect the locking environment they run in.
> 
> - Some (all) accept filters produce nastily convoluted stack traces 
> involving
>   recursion across really odd combinations of sockets and protocols.  For
>   example, you can see soisdisconnected() calling soisconnected().
> 
> Some of this is inherent to the design of accept filters and so_upcall 
> and follows from using a single function pointer for many different 
> cases.  I have an 8.x todo list item to try and figure out how to make 
> so_upcall and accept filters rational in the context of fine-grained 
> locking, but I've not yet reached that point on my todo list.  I think 
> we can continue to incrementally hack so_upcall and accept filters to 
> fix many races, but the reality is that we need a more coherent model 
> for dealing with accept filters.  One idea that Colin Percival (I think) 
> suggested is that we separate socket upcalls from accept filters, and 
> that accept filters consistent of a predicate for completion rather than 
> directly invoking socket state transitions.  I've not explored the 
> implications, but think it might well be a good idea to avoid the weird 
> stack traces.

I've experimented with some changes to sowakeup() to better formalize it.
Code diff below.  Observations noted in the comments.

-- 
Andre


Index: uipc_sockbuf.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_sockbuf.c,v
retrieving revision 1.165
diff -u -p -r1.165 uipc_sockbuf.c
--- uipc_sockbuf.c	6 Sep 2006 21:59:36 -0000	1.165
+++ uipc_sockbuf.c	25 Feb 2007 15:22:32 -0000
@@ -173,11 +173,12 @@ sowakeup(struct socket *so, struct sockb

  	SOCKBUF_LOCK_ASSERT(sb);

-	selwakeuppri(&sb->sb_sel, PSOCK);
+#if 1
+	selwakeuppri(&sb->sb_sel, PSOCK);	/* removes thread from sleepq */
  	sb->sb_flags &= ~SB_SEL;
  	if (sb->sb_flags & SB_WAIT) {
  		sb->sb_flags &= ~SB_WAIT;
-		wakeup(&sb->sb_cc);
+		wakeup(&sb->sb_cc);		/* removes thread from sleepq too!? */
  	}
  	KNOTE_LOCKED(&sb->sb_sel.si_note, 0);
  	SOCKBUF_UNLOCK(sb);
@@ -188,6 +189,46 @@ sowakeup(struct socket *so, struct sockb
  	if (sb->sb_flags & SB_AIO)
  		aio_swake(so, sb);
  	mtx_assert(SOCKBUF_MTX(sb), MA_NOTOWNED);
+#else
+	/* Only wakeup if above low water mark. */
+	if (so->so_error == 0 && sb->sb_lowat < sbspace(sb))
+		return;
+	/* First run any upcalls which may process or modify the data. */
+	if (sb->sb_flags & SB_UPCALL) {
+		/*
+		 * Upcall tells us whether to do a wakeup or not.
+		 * Need a flag on the socket to tell select/poll and kqueue
+		 * to ignore socket buffer until cleared.
+		 * Maybe we should defer the upcall to a worker thread using
+		 * task queue?
+		 */
+		if ((*so->so_upcall)(so, sb, so->so_upcallarg, M_DONTWAIT)) {
+			sb->sb_flags |= SB_IGNORE;
+			SOCKBUF_UNLOCK(sb);
+			return;			/* don't wakeup, more work! */
+		}
+	}
+	/* KQueue notifications. */
+	KNOTE_LOCKED(&sb->sb_sel.si_note, 0);	/* issues wakeup */
+	/* Someone doing select/poll? */
+	if (sb->sb_flags & SB_SEL) {
+		selwakeuppri(&sb->sb_sel, PSOCK); /* removes thread from sleepq */
+		sb->sb_flags &= ~SB_SEL;
+	}
+	/* Someone waiting blocked on socket buffer. */
+	if (sb->sb_flags & SB_WAIT) {
+		wakeup(&sb->sb_cc);		/* removes thread from sleepq */
+		sb->sb_flags &= ~SB_WAIT;
+	}
+	/* Async IO notificatins. */
+	if (sb->sb_flags & SB_AIO)
+		aio_swake(so, sb);
+	/* Socket buffer won't be accessed anymore. */
+	SOCKBUF_UNLOCK(sb);
+	/* Send SIGIO or SIGURG to thread.  XXX: Is this still needed? */
+	if ((so->so_state & SS_ASYNC) && so->so_sigio != NULL)
+		pgsigio(&so->so_sigio, SIGIO, 0);
+#endif
  }

  /*