From owner-freebsd-hackers@freebsd.org Fri Jan 5 23:30:21 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 946F4EA6C9A; Fri, 5 Jan 2018 23:30:21 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from mail.metricspace.net (mail.metricspace.net [IPv6:2001:470:1f11:617::107]) by mx1.freebsd.org (Postfix) with ESMTP id CAFD56582F; Fri, 5 Jan 2018 23:30:20 +0000 (UTC) (envelope-from eric@metricspace.net) Received: from [10.148.202.109] (mobile-166-171-187-244.mycingular.net [166.171.187.244]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: eric) by mail.metricspace.net (Postfix) with ESMTPSA id C40378AD8; Fri, 5 Jan 2018 23:30:16 +0000 (UTC) Date: Fri, 05 Jan 2018 18:30:14 -0500 User-Agent: K-9 Mail for Android In-Reply-To: References: <33bcd281-4018-7075-1775-4dfcd58e5a48@metricspace.net> <4ec1f3b1-f4b0-80ab-0e68-0dd679dd9e37@metricspace.net> <72f6097e-c71e-b53f-6885-cfe5a5a56586@metricspace.net> MIME-Version: 1.0 Subject: Re: A more general possible meltdown/spectre countermeasure To: Warner Losh CC: "freebsd-hackers@freebsd.org" , "freebsd-arch@freebsd.org" From: Eric McCorkle Message-ID: <9268C1F8-AD68-4B20-94D7-96B5FD6589B5@metricspace.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.25 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jan 2018 23:30:21 -0000 Ah, superpages=2E I wouldn't think so=2E The cpu still has to do a page tab= le walk (just stopping at the top level page table), and would discover tha= t it's not accessible=2E On January 5, 2018 6:24:14 PM EST, Warner Losh wrote: >I mean the mappings we have in the kernel that map all of memory to a >specific page using 512GB pages in >sys/amd64/amd64/pmap=2Ec:create_pagetables=2E This allows us to map any P= A >to a >VA with simple math rather than a page table walk=2E > >Warner > >On Fri, Jan 5, 2018 at 4:10 PM, Eric McCorkle >wrote: > >> I'm not sure what you mean by direct map=2E Do you mean TLB? >> >> On 01/05/2018 18:08, Warner Losh wrote: >> > Wouldn't you have to also unmap it from the direct map for this to >be >> > effective? >> > >> > Warner >> > >> > >> > On Fri, Jan 5, 2018 at 3:31 PM, Eric McCorkle > > > wrote: >> > >> > Well, the only way to find out would be to try it out=2E >> > >> > However, unless I'm missing something, if you're trying to pull >a >> > meltdown attack, you try and fetch from the kernel=2E If that >location >> > isn't cached (or if your cache is physically indexed), you need >the >> > physical address (otherwise you don't know where to look), and >thus >> have >> > to go through address translation, at which point you detect >that the >> > page isn't accessible and fault=2E In the mean time, you can't >> > speculatively execute any of the operations that load up the >> > side-channels, because you don't have the sensitive data=2E >> > >> > The reason you can pull off a meltdown attack at all is that a >> > virtually-indexed cache lets you get the data in parallel with >> address >> > translation (breaking the dependency between address >translation and >> > fetching data), which takes 1000s of cycles for a TLB miss, >during >> which >> > you have the data and can launch a whole bunch of transient >ops=2E >> > >> > Again, these are uncharted waters we're in; so it's entirely >possible >> > I'm missing something here=2E >> > >> > On 01/05/2018 17:22, Warner Losh wrote: >> > > While you might be right, I've seen no indication that a >cache miss >> > > would defeat these attacks in the public and non-public data >I've >> looked >> > > at, even though a large number of alternatives to the >published >> > > workarounds have been discussed=2E I'm therefore somewhat >skeptical >> this >> > > would be effective=2E I'm open, however, to data that changes >that >> > > skepticism=2E=2E=2E >> > > >> > > Warner >> > > >> > > On Fri, Jan 5, 2018 at 3:15 PM, Eric McCorkle < >> eric@metricspace=2Enet >> > > >= > >> wrote: >> > > >> > > Right, but you have to get the value "foo" into the >pipeline >> in order >> > > for it to affect the side-channels=2E This technique >attempts >> to stop >> > > that from happening=2E >> > > >> > > Unless I made a mistake, non-cached memory reads force >address >> > > translation to happen first, which detects faults and >blocks >> the >> > > meltdown attack=2E >> > > >> > > It also stops spectre with very high probability, as it's >very >> unlikely >> > > that an uncached load will arrive before the speculative >> thread gets >> > > squashed=2E >> > > >> > > On 01/05/2018 17:10, Warner Losh wrote: >> > > > I think this is fatally flawed=2E >> > > > >> > > > The side channel is the cache=2E Not the data at risk=2E >> > > > >> > > > Any mapped memory, cached or not, can be used to >influence >> the cache=2E >> > > > Storing stuff in uncached memory won't affect the side >> channel one bit=2E >> > > > >> > > > Basically, all attacks boil down to tricking the >processor, >> at elevated >> > > > privs, to doing something like >> > > > >> > > > a =3D foo[offset]; >> > > > >> > > > where foo + offset are designed to communicate >information >> by populating >> > > > a cache line=2E offset need not be cached itself and can >be >> the result of >> > > > simple computations that depend on anything accessible >at >> all in the kernel=2E >> > > > >> > > > Warner >> > > > >> > > > On Fri, Jan 5, 2018 at 3:02 PM, Eric McCorkle < >> eric@metricspace=2Enet >> > > >> > > > >> > >>> >wrote: >> > > > >> > > > Re-posting to -hackers and -arch=2E I'm going to >start >> working on >> > > > something like this over the weekend=2E >> > > > >> > > > -------- Forwarded Message -------- >> > > > Subject: A more general possible meltdown/spectre >> countermeasure >> > > > Date: Thu, 4 Jan 2018 23:05:40 -0500 >> > > > From: Eric McCorkle > eric@metricspace=2Enet> >> > > > >> > >> > > >>> >> > > > To: freebsd-security@freebsd=2Eorg > freebsd-security@freebsd=2Eorg> >> > > > freebsd=2Eorg>> >> > > > > freebsd-security@freebsd=2Eorg> >> > > > freebsd=2Eorg>>> >> > >> > > > freebsd=2Eorg>> >> > > > > > >> > > > > >>> >> > > > >> > > > I've thought more about how to deal with >> > meltdown/spectre, and >> > > I have an >> > > > idea I'd like to put forward=2E However, I'm still >in >> > something >> > > of a >> > > > panic mode, so I'm not certain as to its >effectiveness=2E >> > > Needless to >> > > > say, I welcome any feedback on this, and I may be >> completely >> > > off-base=2E >> > > > >> > > > I'm calling this a "countermeasure" as opposed to a >> > > "mitigation", as >> > > > it's something that requires modification of code >as >> > opposed to a >> > > > drop-in patch=2E >> > > > >> > > > =3D=3D Summary =3D=3D >> > > > >> > > > Provide a kernel and userland API by which memory >> allocation >> > > can be done >> > > > with extended attributes=2E In userland, this could >be >> > > accomplished by >> > > > extending MMAP flags, and I could imagine a >> > > malloc-with-attributes flag=2E >> > > > In kernel space, this must already exist, as >drivers >> > need to >> > > allocate >> > > > memory with various MTRR-type attributes set=2E >> > > > >> > > > The immediate aim here is to store sensitive >information >> > that must >> > > > remain memory-resident in non-cacheable memory >locations >> > (or, >> > > if more >> > > > effective attribute combinations exist, using those >> > instead)=2E >> > > See the >> > > > rationale for the argument why this should work=2E >> > > > >> > > > Assuming the rationale holds, then the attack >surface >> should >> > > be greatly >> > > > reduced=2E Attackers would need to grab sensitive >data >> > out of stack >> > > > frames or similar locations if/when it gets copied >there >> for >> > > faster use=2E >> > > > Moreover, if this is done right, it could dovetail >> > nicely into a >> > > > framework for storing and processing sensitive >assets in >> > more >> > > secure >> > > > hardware[0] (like smart cards, the FPGAs I posted >> > earlier, or >> > > other >> > > > options)=2E >> > > > >> > > > The obvious downside is that you take a performance >hit >> > > storing things >> > > > in non-cacheable locations, especially if you plan >on >> > doing heavy >> > > > computation in that memory (say, >encryption/decryption)=2E >> > > However, this >> > > > is almost certainly going to be less than the >projected >> > 30-50% >> > > > performance hit from other mitigations=2E Also, this >> > technique >> > > should >> > > > work against spectre as well as meltdown (assuming >the >> > > rationale holds)=2E >> > > > >> > > > The second downside is that you have to modify code >for >> this >> > > to work, >> > > > and you have to be careful not to keep copies of >> sensitive >> > > information >> > > > around too long (this gets tricky in userland, >where you >> > might get >> > > > interrupted and switched out)=2E >> > > > >> > > > >> > > > [0]: Full disclosure, enabling open hardware >> implementations >> > > of this >> > > > kind of thing is something of an agenda of mine=2E >> > > > >> > > > =3D=3D Rationale =3D=3D >> > > > >> > > > (Again, I'm tired, rushed, and somewhat panicked so >my >> logic >> > > could be >> > > > faulty at any point, so please point it out if it >is) >> > > > >> > > > The rationale for why this should work relies on >> > assumptions about >> > > > out-of-order pipelines that cannot be guaranteed to >> > hold, but are >> > > > extremely likely to be true=2E >> > > > >> > > > As background, these attacks depend on out-of-order >> > execution >> > > performing >> > > > operations that end up affecting cache and >> branch-prediction >> > > state, >> > > > ultimately storing information about sensitive data >in >> these >> > > > side-channels before the fault conditions are >detected >> and >> > > acted upon=2E >> > > > I'll borrow terminology from the paper, using >"transient >> > > instructions" >> > > > to refer to speculatively executed instructions >that will >> > > eventually be >> > > > cancelled by a fault=2E >> > > > >> > > > These attacks depend entirely on transient >instructions >> > being >> > > able to >> > > > get sensitive information into the processor core >and >> then >> > > perform some >> > > > kind of instruction on them before the fault >condition >> > cancels >> > > them=2E >> > > > Therefore, anything that prevents them from doing >this >> > > *should* counter >> > > > the attack=2E If the actual sensitive data never >makes it >> to >> > > the core >> > > > before the fault is detected, the dependent memory >> > > accesses/branches >> > > > never get executed and the data never makes it to >the >> > > side-channels=2E >> > > > >> > > > Another assumption here is that CPU architects are >going >> to >> > > want to >> > > > squash faulted instructions ASAP and stop issuing >along >> > those >> > > > speculative branches, so as to reclaim execution >units=2E >> So >> > > I'm assuming >> > > > once a fault comes back from address translation, >then >> > transient >> > > > execution stops dead=2E >> > > > >> > > > Now, break down the cases for whether the address >> containing >> > > sensitive >> > > > data is in cache and TLB or not=2E (I'm assuming >here that >> > > caches are >> > > > virtually-indexed, which enables cache lookups to >bypass >> > address >> > > > translation=2E) >> > > > >> > > > * In cache, in TLB: You end up basically racing >between >> the >> > > cache and >> > > > TLB, which will very likely end up detecting the >fault >> > before >> > > the data >> > > > arrives, but at the very worst, you get one or two >> cycles of >> > > transient >> > > > instruction execution before the fault=2E >> > > > >> > > > * In cache, not in TLB: Virtually-indexed tagged >means >> > you get >> > > a cache >> > > > lookup racing a page-table walk=2E The cache lookup >beats >> the >> > > page table >> > > > walk by potentially hundreds (maybe thousands) of >cycles, >> > > giving you a >> > > > bunch of transient instructions before a fault gets >> > > triggered=2E This is >> > > > the main attack case=2E >> > > > >> > > > * Not in cache, in TLB: Memory access requires >address >> > > translation, >> > > > which comes back almost immediately as a fault=2E >> > > > >> > > > * Not in cache, not in TLB: You have to do a page >table >> walk >> > > before you >> > > > can fetch the location, as you have to go out to >physical >> > > memory (and >> > > > therefore need a physical address)=2E The page table >walk >> > will >> > > come back >> > > > with a fault, stopping the attack=2E >> > > > >> > > > So, unless I'm missing something here, both >non-cached >> cases >> > > defeat the >> > > > meltdown attack, as you *cannot* get the data >unless you >> do >> > > address >> > > > translation first (and therefore detect faults)=2E >> > > > >> > > > As for why this defeats the spectre attack, the >logic is >> > > similar: you've >> > > > jumped into someone else's executable code, hoping >to >> > scoop up >> > > enough >> > > > information into your branch predictor before the >fault >> > kicks >> > > you out=2E >> > > > However, to capture anything about sensitive >information >> > in your >> > > > side-channels, the transient instructions need to >> > actually get >> > > it into >> > > > the core before a fault gets detected=2E The same >case >> > analysis >> > > as above >> > > > applies, so you never actually get the sensitive >info >> > into the >> > > core >> > > > before a fault comes back and you get squashed=2E >> > > > >> > > > >> > > > [1]: A physically-indexed cache would be largely >immune >> to >> > > this attack, >> > > > as you'd have to do address translation before >doing a >> cache >> > > lookup=2E >> > > > >> > > > >> > > > I have some ideas that can build on this, but I'd >like >> > to get some >> > > > feedback first=2E >> > > > _______________________________________________ >> > > > freebsd-security@freebsd=2Eorg >> > >> > > > > > >> > > > > >> > > > freebsd=2Eorg>>> >> > > > mailing list >> > > > https://lists=2Efreebsd=2Eorg/mailman/listinfo/freebs= d- >> security >> > >> > > =20 >> > = > >> > > > =20 >> security >> > >> > > =20 >> > = >> >> > > > To unsubscribe, send any mail to >> > > > "freebsd-security-unsubscribe@freebsd=2Eorg >> > >> > > > > > >> > > > > > >> > > > > >>" >> > > > _______________________________________________ >> > > > freebsd-arch@freebsd=2Eorg >> org> >> > > >> > > > > >> > >> >> > > mailing list >> > > > =20 >https://lists=2Efreebsd=2Eorg/mailman/listinfo/freebsd-arch >> > >> > > > > > >> > > > =20 >> > >> > > > > >> >> > > > To unsubscribe, send any mail to >> > > > "freebsd-arch-unsubscribe@freebsd=2Eorg >> > >> > > > > > >> > > > > > >> > > > > >>" >> > > > >> > > > >> > > >> > > >> > >> > >> --=20 Sent from my Android device with K-9 Mail=2E Please excuse my brevity=2E From owner-freebsd-hackers@freebsd.org Fri Jan 5 23:34:03 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DD57DEA7125 for ; Fri, 5 Jan 2018 23:34:03 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 9F4EA65FB6; Fri, 5 Jan 2018 23:34:03 +0000 (UTC) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.55.3]) by phk.freebsd.dk (Postfix) with ESMTP id 463782739C; Fri, 5 Jan 2018 23:33:58 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.15.2/8.15.2) with ESMTPS id w05NXgls024456 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 5 Jan 2018 23:33:42 GMT (envelope-from phk@critter.freebsd.dk) Received: (from phk@localhost) by critter.freebsd.dk (8.15.2/8.15.2/Submit) id w05NXgj0024455; Fri, 5 Jan 2018 23:33:42 GMT (envelope-from phk) To: cem@freebsd.org cc: Freebsd hackers list Subject: Re: Is it considered to be ok to not check the return code of close(2) in base? In-reply-to: From: "Poul-Henning Kamp" References: <24acbd94-c52f-e71a-8a96-d608a10963c6@rawbw.com> <1514572041.12000.7.camel@freebsd.org> <20180105221330.GD95035@spindle.one-eyed-alien.net> <24173.1515191675@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <24453.1515195222.1@critter.freebsd.dk> Content-Transfer-Encoding: quoted-printable Date: Fri, 05 Jan 2018 23:33:42 +0000 Message-ID: <24454.1515195222@critter.freebsd.dk> X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jan 2018 23:34:04 -0000 -------- In message , Conra d Meyer writes: >On Fri, Jan 5, 2018 at 2:34 PM, Poul-Henning Kamp wr= ote: >> Brookes suggestion, while well intentioned, wouldn't get very far, >> because it is common for shells and shell-like programs to do: >> >> for (i =3D 3; i < ALOT; i++) >> (void)close(i); >> >> To get rid of unwanted filedescriptors from syslog(3), getpwent(3) etc. >> in the child process. >> >> Yes, I know about closefrom(2), but a lot of programs still don't use i= t. > >Hi, > >That seems like a good way to quickly identify programs in base that >still do not use closefrom(). Absolutely, by all means *identify* these programs, but before you start *killing* them, make sure you your system can actually function: $ cd /usr/src/bin/sh $ find . -name '*.c' -print | xargs grep closefrom $ cd /usr/src/contrib/tcsh $ find . -name '*.c' -print | xargs grep closefrom $ = As I said: I'm all for making the -current kernel more paranoid about userland, but log the results to syslog (with rate-limiting!), don't just kill the process. -- = Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe = Never attribute to malice what can adequately be explained by incompetence= .