Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 24 Feb 2015 20:57:58 -0800
From:      Harrison Grundy <harrison.grundy@astrodoggroup.com>
To:        Warner Losh <imp@bsdimp.com>, John-Mark Gurney <jmg@funkthat.com>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, Alfred Perlstein <alfred@freebsd.org>, Ian Lepore <ian@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: locks and kernel randomness...
Message-ID:  <54ED5656.50607@astrodoggroup.com>
In-Reply-To: <2F49527F-2F58-4BD2-B8BE-1B1190CCD4D0@bsdimp.com>
References:  <20150224015721.GT74514@kib.kiev.ua> <54EBDC1C.3060007@astrodoggroup.com> <20150224024250.GV74514@kib.kiev.ua> <DD06E2EA-68D6-43D7-AA17-FB230750E55A@bsdimp.com> <20150224174053.GG46794@funkthat.com> <54ECBD4B.6000007@freebsd.org> <20150224182507.GI46794@funkthat.com> <54ECEA43.2080008@freebsd.org> <20150224231921.GQ46794@funkthat.com> <1424822522.1328.11.camel@freebsd.org> <20150225002956.GT46794@funkthat.com> <2F49527F-2F58-4BD2-B8BE-1B1190CCD4D0@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help


On 02/24/15 17:01, Warner Losh wrote:
> 
>> On Feb 24, 2015, at 5:29 PM, John-Mark Gurney <jmg@funkthat.com>
>> wrote:
>> 
>> Ian Lepore wrote this message on Tue, Feb 24, 2015 at 17:02
>> -0700:
>>> On Tue, 2015-02-24 at 15:19 -0800, John-Mark Gurney wrote:
>>>> Alfred Perlstein wrote this message on Tue, Feb 24, 2015 at
>>>> 16:16 -0500:
>>>>> On 2/24/15 1:25 PM, John-Mark Gurney wrote:
>>>>>> Alfred Perlstein wrote this message on Tue, Feb 24, 2015
>>>>>> at 13:04 -0500:
>>>>>>> On 2/24/15 12:40 PM, John-Mark Gurney wrote:
>>>>>>>> Warner Losh wrote this message on Tue, Feb 24, 2015
>>>>>>>> at 07:56 -0700:
>>>>>>>>> Then again, if you want to change random(), provide
>>>>>>>>> a weak_random() that???s the traditional non-crypto
>>>>>>>>> thing that???s fast and lockless. That would make
>>>>>>>>> it easy to audit in our tree. The scheduler
>>>>>>>>> doesn???t need cryptographic randomness, it just
>>>>>>>>> needs to make different choices sometimes to ensure
>>>>>>>>> its notion of fairness.
>>>>>>>> 
>>>>>>>> I do not support having a weak_random...  If the
>>>>>>>> consumer is sure enough that you don't need a secure
>>>>>>>> random, then they can pick an LCG and implement it
>>>>>>>> themselves and deal (or not) w/ the locking
>>>>>>>> issues...
>>>>>>>> 
>>>>>>>> It appears that the scheduler had an LCG but for some
>>>>>>>> reason the authors didn't feel like using it here..
>>>>>>> 
>>>>>>> The way I read this argument is that no low quality
>>>>>>> sources of randomness shall be allowed.
>>>>>> 
>>>>>> No, I'm saying that the person who needs the predictable
>>>>>> randomness needs to do extra work to get it...  If they
>>>>>> care that much about performance/predictability/etc, then
>>>>>> a little extra work won't hurt them..  And if they don't
>>>>>> know what an LCG is, then they aren't qualified to make
>>>>>> the decision that a weaker RNG is correct for their 
>>>>>> situation..
>>>>>> 
>>>>>>> So we should get rid of rand(3)?  When do we deprecate
>>>>>>> that?
>>>>>> 
>>>>>> No, we should replace it w/ proper randomness like
>>>>>> OpenBSD has... I'm willing to go that far and I think
>>>>>> FreeBSD should...  OpenBSD has done a lot of leg work in
>>>>>> tracking down ports that correctly use rand(3), and
>>>>>> letting them keep their deterministic randomness, while 
>>>>>> the remaining get real random..
>>>>>> 
>>>>>>> Your argument doesn't hold water.
>>>>>> 
>>>>>> Sorry, you're argument sounds like it's from the 90's
>>>>>> when we didn't know any better on how to make secure
>>>>>> systems...  Will you promise to audit all new uses of
>>>>>> randomness in the system to make sure that they are using
>>>>>> the correct, secure API?
>>>>>> 
>>>>>> Considering that it's been recommended that people NOT
>>>>>> use read_random(9) for 14 years, yet people continue to
>>>>>> use it in new code, demonstrates that people do not know
>>>>>> what they are doing (wrt randomness), and the only way to
>>>>>> make sure they do the correct, secure thing is to only
>>>>>> provide the secure API...
>>>>> 
>>>>> That speaks to more of the drive-by czars we have in BSD
>>>>> land that take an area with a hard lock and then go away.
>>>> 
>>>> It also speaks to the airchair quarterbacking that stops
>>>> people from wanting to contribute...  Someone comes along and
>>>> tries to make an improvement, then x number of people raise
>>>> their arms about oh, I still use grdc (sorry dteske, not
>>>> trying to pick on you) as tcp keep alive, and then the person
>>>> abandons or leaves incomplete the work that they started...
>>>> 
>>>> I was very close to NOT posting the email to -arch, but after
>>>> various questions from twitter, and adrian's continued pleas
>>>> to talk changes more publicly, I decided to do so...  If
>>>> people continue to react this way, it just demonstrates that
>>>> doing things publicly is NOT a way to get things to move
>>>> forward in FreeBSD, and people will continue to do things in
>>>> private...  Luckily, I'm consulting, so I have a few more 
>>>> hours (for now) to fight these fights, but if it continues to
>>>> be an issue, we'll continue to have this problem of czars
>>>> that come in, drop a bunch of code and then leave, because
>>>> dealing w/ this becomes too expensive...
>>>> 
>>>> So far, only ONE person has commented on the patch on
>>>> reviews, and that is delphij...
>>>> 
>>>>> Also, do not want to attempt to be like openbsd, learn from
>>>>> for sure, but to be like, no way.
>>>> 
>>>> I'm fine not being like OpenBSD, but as you said, we should
>>>> learn from them, and leverage their work...  Though I agree
>>>> w/ OpenBSD's work to replace random(3), it also isn't who
>>>> FreeBSD is, but if we want to continue to be relevant, we do
>>>> need to take security seriously, and IMO, this is one of
>>>> those steps.
>>>> 
>>>> If someone does find a performance issue w/ my patch, I WILL
>>>> work with them on a solution, but I will not work w/ people
>>>> who make unfounded claims about the impact of this work...
>>> 
>>> Yeah, the problem could all that.
>>> 
>>> Or it could be people who "collaborate" by saying I'm going to
>>> make this change.  I'm not going to justify it in any way, and
>>> if anybody
>> 
>> I have justified it…
> 
> I think you should explain what you explained to me on IRC.
> 
> Specifically, through a timing attack, you can find (by default)
> the lower 7 bits of the value returned by random(). Since random()
> is not MP safe, it can sometimes return the same value twice
> (through some race that may or may not have been lost). This means
> other users can see this data.
> 
> In this instance, it isn’t so much what sched_ule is doing, but
> rather what others are able to glean from it. Now, it isn’t clear
> that these 7 bits are a big deal since you also have to lose the
> race and know the race was lost. Other things in the system might
> care if you expose this state.
> 
> Also, in this specific case, it can use the current random
> generator in sched_ule to get this number as well. It’s run on a
> time scale of ticks, with some jitter. In this specific case, it
> doesn’t need to be using random(), but it isn’t clear if the
> get_cyclecount() stuff provides enough low-order bits that are
> random enough to meet sched_ule’s needs. But it isn’t clear that it
> doesn’t (only cause for concern is if there’s a beat pattern for a
> cycle count that’s low-resolution, but I don’t think we have any of
> these on SMP work loads).
> 

<... snip ...>

The timing attack I talked to you about on IRC works like this:

A userland process creates as many threads as there are CPUs, and by
manipulating the load they generate, gets it so they're all flagged as
interactive and at the same priority. (alternating spin and sleep with
a 2% duty cycle would work, for instance)

It would also be possible to coerce a userland process, like apache to
behave this way.

These threads now have the ability to preempt all timeshare tasks on
all CPUs for slice_size time, by waking up and spinning at the same
time. This means they can get very precise knowledge about scheduling,
by timing when they get to run, versus when they have to wait.

By watching CPU0, one of these threads can measure balance_ticks.

This is important because balance_ticks directly exposes the last 7
bits it gets back from random(). (The value gets applied to
balance_interval to keep the balancer from running on exactly the same
interval)

This means that if an attacker can trigger the use of random, or is
willing to wait long enough for a race, they can determine the value
of those bits that were passed along to anyone who called random() at
the same time.

It also means that they can eventually discover the state of the RNG,
and predict future values.

The security implications of disclosing the values this way isn't as
severe as it might seem, simply because random() isn't really used in
any cryptographically sensitive areas, but there are definite
consequences, like predicting firewall port values, and NFS client
transaction IDs.

It is, however, surprising to learn that the balance_interval sysctl
has security implications.

--- Harrison



<... snip ...>

> 
> You can prove a negative with benchmarks. Then we’d be arguing
> over the efficacy of them, but at least that would be progress :)
> Or you can strongly suggest a negative by failing to reject the
> null hypothesis of no change. That too would be progress.
> 
>>> they don't, then screw the collobaration thing, I'm just going
>>> to do it anyway.
>> 
>> It goes both ways, I see it that you're objecting w/o complete 
>> intformation, and no mater what evidence or work I do, you'll
>> just ignore it, and still say it isn't correct or that there's
>> this unprovable codition that prevents the work for going in…
> 
> Data is going to break this log-jam.
> 
> Warner
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54ED5656.50607>