From owner-svn-src-head@freebsd.org Sat Jul 14 17:25:56 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 29A111033FFB; Sat, 14 Jul 2018 17:25:56 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from mail.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 712EA81F13; Sat, 14 Jul 2018 17:25:55 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from John-Baldwins-MacBook-Pro-2.local (unknown [73.93.142.209]) by mail.baldwin.cx (Postfix) with ESMTPSA id 8D5F610A87D; Sat, 14 Jul 2018 13:25:53 -0400 (EDT) Subject: Re: svn commit: r336282 - head/sys/netinet To: Sean Bruno , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org References: <201807141619.w6EGJk2Y016469@repo.freebsd.org> <7bbbb553-fca5-9801-49aa-caf6df929c2f@FreeBSD.org> From: John Baldwin Message-ID: <8f1bc11e-5653-b737-42d0-140aa9b8ad84@FreeBSD.org> Date: Sat, 14 Jul 2018 10:26:01 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mail.baldwin.cx); Sat, 14 Jul 2018 13:25:54 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.99.2 at mail.baldwin.cx X-Virus-Status: Clean X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Jul 2018 17:25:56 -0000 On 7/14/18 9:57 AM, Sean Bruno wrote: > > > On 07/14/18 10:37, John Baldwin wrote: >> On 7/14/18 9:19 AM, Sean Bruno wrote: >>> Author: sbruno >>> Date: Sat Jul 14 16:19:46 2018 >>> New Revision: 336282 >>> URL: https://svnweb.freebsd.org/changeset/base/336282 >>> >>> Log: >>> Fixup memory management for fetching options in ip_ctloutput() >>> >>> Submitted by: Jason Eggleston >>> Sponsored by: Limelight Networks >>> Differential Revision: https://reviews.freebsd.org/D14621 >>> >>> Modified: >>> head/sys/netinet/ip_output.c >>> >>> Modified: head/sys/netinet/ip_output.c >>> ============================================================================== >>> --- head/sys/netinet/ip_output.c Sat Jul 14 16:06:53 2018 (r336281) >>> +++ head/sys/netinet/ip_output.c Sat Jul 14 16:19:46 2018 (r336282) >>> @@ -1256,12 +1256,18 @@ ip_ctloutput(struct socket *so, struct sockopt *sopt) >>> switch (sopt->sopt_name) { >>> case IP_OPTIONS: >>> case IP_RETOPTS: >>> - if (inp->inp_options) >>> + if (inp->inp_options) { >>> + unsigned long len = ulmin(inp->inp_options->m_len, sopt->sopt_valsize); >>> + struct mbuf *options = malloc(len, M_TEMP, M_WAITOK); >>> + INP_RLOCK(inp); >>> + bcopy(inp->inp_options, options, len); >>> + INP_RUNLOCK(inp); >>> error = sooptcopyout(sopt, >>> - mtod(inp->inp_options, >>> + mtod(options, >>> char *), >>> - inp->inp_options->m_len); >>> - else >>> + len); >>> + free(options, M_TEMP); >>> + } else >>> sopt->sopt_valsize = 0; >>> break; >> >> You can't malloc an mbuf, and you don't really want an mbuf here anyway. >> Also, style(9) doesn't normally assign values when a variable is created. >> I would perhaps have done something like this: >> >> if (inp->inp_options) { >> void *options; >> u_long len; >> >> INP_RLOCK(inp); >> len = ulmin(inp->inp_options->m_len, sopt->sopt_valsize); >> INP_RUNLOCK(inp); >> options = malloc(len, M_TEMP, M_WAITOK); >> INP_RLOCK(inp); >> len = ulmin(inp->inp_options->m_len, len); >> m_copydata(inp->inp_options, 0, len, options); >> INP_RUNLOCK(inp); >> error = sooptcopyout(sopt, options, len); >> free(options, M_TEMP); >> } >> >> The current code isn't doing what you think it is doing since the bcopy() >> copies the m_data pointer from 'inp_options' into 'options' so that >> 'options->m_data' points at the m_data buffer in the 'inp_options' mbuf. >> Thus, the mtod() returns a pointer into 'inp_options' just like the old >> code, so this commit doesn't change anything in terms of the previous >> race (it is still just as broken). >> >> Also, while INP_RLOCK isn't helpful in my version above for reading the >> 'm_len' member (it's racy to read and then malloc no matter what), you >> still need the lock to ensure the inp_options pointer itself is valid >> when you dereference it. Without the lock another thread might have >> freed inp_options out from under you and the unlocked dereference could >> panic (though it probably just reads garbage on most of our architectures >> rather than panicking since we don't tend to invalidate KVA if you lose >> that race). >> > > > Shall I revert and rethink? Well, we can fix it, but even my fix above is wrong. You can't do the ulmin without breaking getsockopt(). The output length is an in-out parameter to getsockopt() so that you can determine the size of variable-sized variables (such as this one) with this type of pattern: socklen_t len; void *buf; len = 0; getsockopt(s, IPPROTO_IP, IP_OPTIONS, NULL, &len); buf = malloc(len); getsockopt(s, IPPROTO_IP, IP_OPTIONS, buf, &len); (In theory you'd really need to use a loop with realloc until you obtain a consistent snapshot.) However, for this to work, the kernel code that calls sooptcopyout has to always pass the full size. sooptcopyout ensures it doesn't overflow the user's supplied buffer, but depends on the length it is passed to determine the output value of 'len'. The ulmin breaks this since it will return a len of 0 for the first call always. I think the right way to fix this is to use m_dup() with M_NOWAIT while holding the INP lock and fail with ENOMEM if m_dup() fails: INP_RLOCK(inp); if (inp->inp_options) { struct mbuf *options; options = m_dup(inp->inp_options, M_NOWAIT); INP_RUNLOCK(inp); if (options != NULL) { error = sooptcopyout(sopt, mtod(options, char *), options->m_len); m_free(options); } else error = ENOEM; } else { INP_RUNLOCK(inp); sopt->optvalsize = 0; } (Someone else may prefer m_freem() to m_free() in case inp_options can be an arbitrary chain.) I chose to always acquire INP_RLOCK because I would have had to test it again under the lock anyway before calling m_dup() to determine ENOMEM vs no valid options, and getsockopt(IP_OPTIONS) isn't a critical path such that it warrants doing an unlocked check before doing a check under the lock. I would also probably have expanded a bit on the commit message. "Fixup memory management" is a bit vague as opposed to something like "Fix a potential use after free in access to inp_options." I would maybe ask jtl@ to review the above and/or offer an alternative. -- John Baldwin