From owner-freebsd-security@FreeBSD.ORG  Mon Sep 24 21:57:06 2012
Return-Path: <owner-freebsd-security@FreeBSD.ORG>
Delivered-To: freebsd-security@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id C4A271065670;
	Mon, 24 Sep 2012 21:57:06 +0000 (UTC)
	(envelope-from mariusz.gromada@gmail.com)
Received: from mail-we0-f182.google.com (mail-we0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id B83AC8FC0C;
	Mon, 24 Sep 2012 21:57:05 +0000 (UTC)
Received: by weyx43 with SMTP id x43so875776wey.13
	for <multiple recipients>; Mon, 24 Sep 2012 14:57:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=Rdnhgn+H0k5eVHEA/GEJ3cndRipYEgh6+Oqr3ECqct8=;
	b=a8iPoWp3zriCWRD0goAoPWEL6NewmVpv2vsu+FimeZo6ji4X4/dRXlCCkY9w8bnMby
	30BIQgLSPFQ/Fr7fhXDt2LA4c9XAhc6yNRX+a3S0aamXSnIplA0EAxUqI+4reopJQ29s
	rKP/AfWzFVyFI/FMq+c7M7K09nwZJthhpxLSIRxV9PoU9G7Bc2oa50b72uNDvvdBmG1T
	iYSrLHUo8T7Ud9tYkzkWxRykQDUcIakDaiqDI91g8+VozolArygFufnijHWfI8Aah7Qc
	SsHbIe0ct6xReoDiqeT7z/tyk649JjMBQ8TRiE4UTRHWHZa3gTWeuU5wXSokeOzBT2u9
	awDw==
Received: by 10.180.83.66 with SMTP id o2mr17006228wiy.14.1348523824685;
	Mon, 24 Sep 2012 14:57:04 -0700 (PDT)
Received: from [192.168.1.100] (89-76-147-86.dynamic.chello.pl. [89.76.147.86])
	by mx.google.com with ESMTPS id k20sm16811345wiv.11.2012.09.24.14.57.02
	(version=SSLv3 cipher=OTHER); Mon, 24 Sep 2012 14:57:03 -0700 (PDT)
Message-ID: <5060D723.6020305@gmail.com>
Date: Mon, 24 Sep 2012 23:56:51 +0200
From: Mariusz Gromada <mariusz.gromada@gmail.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:15.0) Gecko/20120907 Thunderbird/15.0.1
MIME-Version: 1.0
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
References: <20120918211422.GA1400@garage.freebsd.pl>
	<20120919231051.4bc5335b@gumby.homeunix.com>
	<20120920102104.GA1397@garage.freebsd.pl>
	<201209200758.51924.jhb@freebsd.org>
	<20120922080323.GA1454@garage.freebsd.pl>
	<20120922195325.GH1454@garage.freebsd.pl>
	<505E59DC.7090505@gmail.com>
	<20120923151706.GN1454@garage.freebsd.pl>
In-Reply-To: <20120923151706.GN1454@garage.freebsd.pl>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
X-Mailman-Approved-At: Mon, 24 Sep 2012 22:03:16 +0000
Cc: Ben Laurie <benl@freebsd.org>, freebsd-security@freebsd.org,
	RW <rwmaillists@googlemail.com>,
	Jonathan Anderson <jonathan.anderson@cl.cam.ac.uk>,
	John Baldwin <jhb@freebsd.org>
Subject: Re: Collecting entropy from device_attach() times.
X-BeenThere: freebsd-security@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Security issues \[members-only posting\]"
	<freebsd-security.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-security>, 
	<mailto:freebsd-security-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-security>
List-Post: <mailto:freebsd-security@freebsd.org>
List-Help: <mailto:freebsd-security-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-security>, 
	<mailto:freebsd-security-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Sep 2012 21:57:06 -0000

W dniu 2012-09-23 17:17, Pawel Jakub Dawidek pisze:
> On Sun, Sep 23, 2012 at 02:37:48AM +0200, Mariusz Gromada wrote:
>> W dniu 2012-09-22 21:53, Pawel Jakub Dawidek pisze:
>>> Mariusz, can you confirm my findings?
>>
>> Pawel,
>>
>> Your conclusions can be easily confirmed by shape analysis of the EDF.
>> Usually maximum quantile difference (called D-statistic) gives you a
>> kind of overview, function shape gives you a strong feeling, p-value
>> gives you a formal proof.
>> D-statistic values (your data):
>>
>>    6bit:   0.33%
>>    7bit:   0.29%
>>    8bit:   0.27%
>>    9bit:   0.21%
>> 10bit:   6.34%
>> 11bit:  19.07%
>> 12bit:  54.80%
>>
>> What I would say: increasing the number of bits from 6 to 9 does not
>> affect distribution "uniformity", reaching the tenth bit results in
>> sudden increase in the difference measure -  the more bits, the more
>> difference is observed. Distribution shape analysis for the 10th bit
>> shows non-linear function. Lack of "randomness" in the quntile
>> difference curve - chart  shows completely lack of noise (pure
>> functional relation).  These are very strong indicators that starting
>> from 10th bit distribution was changed and is no longer uniform.
>>
>> To formally confirm above conclusion for i.e. 5% significance level,
>> which means that confidence level is 95%, I need some extra data
>> regarding sample sizes. Please pass to me number of collected
>> observations in each 6-12 bit experiment.
>
> Total number of observations was 162833.
>

Ok, finally I have some formal results. To be completely honest I need 
to point out that, in fact, we have a discrete data (for example 
integers 0, 1, ..., 63, but not continues numbers spread across 0 and 
63). That is way  I am going to use two sample Kolmogorov-Smirnov test. 
  Methodology is simple:

- Pawel’s data will be called empirical one
- Theoretical data will be generated as a sequence of unique integer 
numbers from 0 to 2**n -1, where n is the number of bits. Assumption - 
each number appears in theoretical data only once representing ideal 
uniform distribution.

Calculations will be done in the R-cran package

Loading empirical data form files:

 > e6 = read.table("E:\\pawel\\dhr2_6bit_sorted.txt")
 > e7 = read.table("E:\\pawel\\dhr2_7bit_sorted.txt")
 > e8 = read.table("E:\\pawel\\dhr2_8bit_sorted.txt")
 > e9 = read.table("E:\\pawel\\dhr2_9bit_sorted.txt")
 > e10 = read.table("E:\\pawel\\dhr2_10bit_sorted.txt")
 > e11 = read.table("E:\\pawel\\dhr2_11bit_sorted.txt")
 > e12 = read.table("E:\\pawel\\dhr2_12bit_sorted.txt")

Generating ideal theoretical data:

 > t6 = c(0:(2**6-1))
 > t7 = c(0:(2**7-1))
 > t8 = c(0:(2**8-1))
 > t9 = c(0:(2**9-1))
 > t10 = c(0:(2**10-1))
 > t11 = c(0:(2**11-1))
 > t12 = c(0:(2**12-1))

Performing KS tests:

 > ks.test(e6, t6)
D = 0.0032, p-value = 1

 > ks.test(e7, t7)
D = 0.0029, p-value = 1

 > ks.test(e8, t8)
D = 0.0027, p-value = 1

 > ks.test(e9, t9)
D = 0.0022, p-value = 1

 > ks.test(e10, t10)
D = 0.0634, p-value = 0.0005562

 > ks.test(e11, t11)
D = 0.1907, p-value < 2.2e-16

 > ks.test(e12, t12)
D = 0.5479, p-value < 2.2e-16

As you can see D-statistics are almost the same as calculated by Pawel 
(considering roundings). P-values are very interesting due to very high 
number of observations generated by Pawel. Between 6 bits and 9 bits 
estimated p-values are equal to 1, so it means that it is impossible (at 
any significance level) to reject null hypothesis stating that compared 
distributions are equal. Final conclusion: it has to be random, and for 
sure it is random!

Additionally starting form 10 bits we can observe dramatic decrease of 
p-value (from 100% to c.a. 0,06% and much less for the 11-12 bits). So 
low p-value means that it is impossible not to reject null hypothesis 
stating that compared distributions are equal. Final conclusion: it 
cannot be random, and for sure it is not random.

I did the same comparison for the previous real device attach data (2081 
obs.). R code and the results are below:

 > e16 = read.table("E:\\pawel\\device_attach_16bit.log")
 > t16 = c(0:(2**16-1))
 > ks.test(e16, t16)
D = 0.0178, p-value = 0.5422

Again, D-statistic an p-value are almost the same as previously 
calculated "manually". P-value is very high (it is not as high as in the 
6-12 bits tests, but consider much lower number of observations: 2081 vs 
  162833), giving almost sureness that you have captured real 16-bits 
entropy!

Regards,
Mariusz