From owner-svn-src-head@freebsd.org Sat Jan 23 07:10:14 2016 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57B6CA8D79F; Sat, 23 Jan 2016 07:10:14 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id AA0AA188C; Sat, 23 Jan 2016 07:10:12 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lgwl-lstewart2.corp.netflix.com (c110-22-60-167.eburwd6.vic.optusnet.com.au [110.22.60.167]) by lauren.room52.net (Postfix) with ESMTPSA id 477E57E839; Sat, 23 Jan 2016 18:09:40 +1100 (EST) Subject: Re: svn commit: r294536 - head/sys/netinet To: Gleb Smirnoff References: <201601212253.u0LMrC7B016136@repo.freebsd.org> <56A1C9F6.2010708@freebsd.org> <20160122180411.GA1004@FreeBSD.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org From: Lawrence Stewart X-Enigmail-Draft-Status: N1110 Message-ID: <56A3271F.6010904@freebsd.org> Date: Sat, 23 Jan 2016 18:09:19 +1100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20160122180411.GA1004@FreeBSD.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=2.4 required=5.0 tests=DNS_FROM_AHBL_RHSBL, T_FRT_STOCK2,UNPARSEABLE_RELAY autolearn=no version=3.3.2 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lauren.room52.net X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Jan 2016 07:10:14 -0000 On 01/23/16 05:04, Gleb Smirnoff wrote: > On Fri, Jan 22, 2016 at 05:19:34PM +1100, Lawrence Stewart wrote: > L> On 01/22/16 09:53, Gleb Smirnoff wrote: > L> > Author: glebius > L> > Date: Thu Jan 21 22:53:12 2016 > L> > New Revision: 294536 > L> > URL: https://svnweb.freebsd.org/changeset/base/294536 > L> > > L> > Log: > L> > Refactor TCP_CONGESTION setsockopt handling: > L> > - Use M_TEMP instead of stack variable. > L> > - Unroll error handling, removing several levels of indentation. > L> > L> As noted privately elsewhere, this change introduces races with respect > L> to CC algorithm module unloading which will need to be fixed. Just > L> mentioning it again publicly so others are aware of it. > > In the code checked in the check for (CC_ALGO(tp)->ctl_output being > not NULL and the actual call are under the same lock. > > Is that the race you refer to? No, the TCP_CONGESTION refactoring i.e. this commit, introduced races in the get and set cases. I guess I didn't provide enough context in Crucible, so here goes... The post refactoring get code is now: case TCP_CONGESTION: INP_WUNLOCK(inp); error = sooptcopyout(sopt, CC_ALGO(tp)->name, TCP_CA_NAME_MAX); break; Consider that tp is using cc_blah and that the cc_blah module is unloaded as the copy out is happening. By not holding the INP lock, the CC module unload code is able to walk the list of active connections, find this connection is using cc_blah, acquire the INP lock, switch this connection to cc_newreno, release the lock and finally unload the cc_blah module, rendering the pointer passed in to sooptcopyout garbage. See cc_deregister_algo() in cc.c and tcp_ccalgounload() in tcp_subr.c for details related to CC module unload. The post refactoring set code is now: case TCP_CONGESTION: INP_WUNLOCK(inp); buf = malloc(TCP_CA_NAME_MAX, M_TEMP, M_WAITOK|M_ZERO); error = sooptcopyin(sopt, buf, TCP_CA_NAME_MAX, 1); if (error) { free(buf, M_TEMP); break; } CC_LIST_RLOCK(); STAILQ_FOREACH(algo, &cc_list, entries) if (strncmp(buf, algo->name, TCP_CA_NAME_MAX) == 0) break; CC_LIST_RUNLOCK(); free(buf, M_TEMP); if (algo == NULL) { error = EINVAL; break; } INP_WLOCK_RECHECK(inp); /* * We hold a write lock over the tcb so it's safe to * do these things without ordering concerns. */ if (CC_ALGO(tp)->cb_destroy != NULL) CC_ALGO(tp)->cb_destroy(tp->ccv); CC_ALGO(tp) = algo; /* * If something goes pear shaped initialising the new * algo, fall back to newreno (which does not * require initialisation). */ if (algo->cb_init != NULL && algo->cb_init(tp->ccv) != 0) { CC_ALGO(tp) = &newreno_cc_algo; /* * The only reason init should fail is * because of malloc. */ error = ENOMEM; } INP_WUNLOCK(inp); break; Consider that the application has requested to switch to "blah" congestion control and that the cc_blah module is unloaded as the request is being processed. We hold no locks, copy in the string "blah", grab the CC list lock, find cc_blah's algo struct in the list leaving its pointer stored in the "algo" variable and drop the CC list lock. Crucially at this point, we don't hold the INP lock, so once we drop the CC list lock, the CC module unload code is free to pull cc_blah from the CC module list, walk the list of active connections and switch any using cc_blah to cc_newreno and to unload the cc_blah module. The tp we're modifying hasn't switched to cc_blah yet so it won't be reverted to cc_newreno by the CC module unload code, which means the "algo" variable will be a garbage pointer by the time we get around to grabbing the INP lock at the INP_WLOCK_RECHECK() line of code. You have to ensure that the algo pointer is valid and that the INP lock is held at the time of assigning to the tp, as that will ensure that if you're racing with the unload code, the unload code will correctly switch the connection's algo back to cc_newreno once it's able to acquire the INP lock. Cheers, Lawrence