Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 09 Nov 2007 11:13:00 -0800
From:      Nate Lawson <nate@root.org>
To:        Colin Percival <cperciva@freebsd.org>
Cc:        cvs-src@FreeBSD.org, Kris Kennaway <kris@FreeBSD.org>, src-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   Re: cvs commit: src/sys/amd64/amd64 mp_machdep.c src/sys/i386/i386 mp_machdep.c
Message-ID:  <4734B13C.6050008@root.org>
In-Reply-To: <47340B74.9070004@freebsd.org>
References:  <200711081945.lA8JjKcW080540@repoman.freebsd.org> <47337724.9040108@FreeBSD.org> <47337940.6040909@root.org> <47340B74.9070004@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Colin Percival wrote:
> Nate Lawson wrote:
>> I'm still waiting for what will be done to prevent the attack on
>> uniprocessor or multi-core machines (shared L2).  Continuing to focus on
>> hyperthreading is like locking the screen door on your submarine.
> 
> Exploiting the a cache collision channel through the L2 cache is much harder
> than through the L1 cache, and is likely impossible under many circumstances
> (OpenSSL has been fixed to prevent the most easily exploitable cache side
> channel).  In addition, there are other attacks, e.g., using shared branch
> prediction tables, to which hyperthreaded processors are vulnerable but which
> do not affect multicore systems at all.

Even uniprocessor is vulnerable to BTB side channel if a context switch
occurs so saying multicore systems are not affected at all is a bit of a
stretch.  I agree HTT gives you the best vantage point for
observing/affecting all of these microarchitectural details.  However,
if something leaves a state change in the CPU that is visible from HTT,
that state also survives a context switch.  The only question is how
many more samples are required than when using HTT.

> Rather than locking the screen door on a submarine, I'd say that a more apt
> comparison would be turning off a fire hydrant even though a garden hose is
> still running.  I recommend the use of more sophisticated countermeasures
> against side channel attacks where highly sensitive keying material is
> concerned; but this does not invalidate the utility of applying such a very
> simple countermeasure which prevents a very easy attack.

[I wrote the below privately but think it might be useful as part of
this thread]

Research since 2005 has confirmed that HTT is not the only way cache
timing behavior can be observed.  Multi-core and uniprocessor are both
vulnerable to the attack you publicized and L2 is vulnerable, albeit
with more samples required.

Further research into new side channels like the branch target buffer
(that cannot be turned off like HTT) has shown that the cryptographic
software itself must contain countermeasures in addition to operating
system support for a "stealth mode".  Continuing to disable HTT and
claiming this helps is dishonest to our users since it's not a true
stealth mode.  (Context switches can still occur, revealing intermediate
state and disabling HTT doesn't address multi-core.)

Fixes which address all of the cache-related threats to RSA [1] and
mitigate BTB attacks [2] were contributed to OpenSSL.  The first fix
involves "striping" the windowed exponent across cache lines so that use
of any of the exponents has the same cache access behavior.  The second
involves removing conditional branches from the modular arithmetic.

I think the solution should be to document the security info for users
and developers.  Users should be notified that if they are deploying a
server, they should be sure their cryptography libraries address side
channel attacks.  FreeBSD's default configuration of included libraries
like OpenSSL should be noted.  Developers of cryptographic libraries
should be notified that they are responsible for avoiding data-dependent
behavior and being aware of microarchitectural side channels.

Careful coding can address most side channel attacks, but I still think
OS's need a standard API for a stealth mode where a privileged process
can request exclusive access to the CPU it is running on for a short
quantum, with a guarantee that they will not be preempted unless they
exceed that quantum.  Additional support for cleaning the
microarchitectural side effects (cache, BTB, etc.) would be a bonus.  I
don't know of any standards efforts in this area but it might be
interesting to note.  Fast implementations of AES are a good example
where such support is needed since it is impossible to eliminate cache
timing differences of the table lookups without such a mode.

[1] OpenSSL 0.9.7h, change 10/2005 by Matthew D. Wood of Intel,
http://www.openssl.org/news/changelog.html
[2] OpenSSL 0.9.8f, change 10/2007 by Matthew D. Wood of Intel,
http://www.openssl.org/news/changelog.html

-- 
Nate



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4734B13C.6050008>