Date: Mon, 24 Sep 2012 23:56:51 +0200 From: Mariusz Gromada <mariusz.gromada@gmail.com> To: Pawel Jakub Dawidek <pjd@FreeBSD.org> Cc: Ben Laurie <benl@freebsd.org>, freebsd-security@freebsd.org, RW <rwmaillists@googlemail.com>, Jonathan Anderson <jonathan.anderson@cl.cam.ac.uk>, John Baldwin <jhb@freebsd.org> Subject: Re: Collecting entropy from device_attach() times. Message-ID: <5060D723.6020305@gmail.com> In-Reply-To: <20120923151706.GN1454@garage.freebsd.pl> References: <20120918211422.GA1400@garage.freebsd.pl> <20120919231051.4bc5335b@gumby.homeunix.com> <20120920102104.GA1397@garage.freebsd.pl> <201209200758.51924.jhb@freebsd.org> <20120922080323.GA1454@garage.freebsd.pl> <20120922195325.GH1454@garage.freebsd.pl> <505E59DC.7090505@gmail.com> <20120923151706.GN1454@garage.freebsd.pl>
next in thread | previous in thread | raw e-mail | index | archive | help
W dniu 2012-09-23 17:17, Pawel Jakub Dawidek pisze: > On Sun, Sep 23, 2012 at 02:37:48AM +0200, Mariusz Gromada wrote: >> W dniu 2012-09-22 21:53, Pawel Jakub Dawidek pisze: >>> Mariusz, can you confirm my findings? >> >> Pawel, >> >> Your conclusions can be easily confirmed by shape analysis of the EDF. >> Usually maximum quantile difference (called D-statistic) gives you a >> kind of overview, function shape gives you a strong feeling, p-value >> gives you a formal proof. >> D-statistic values (your data): >> >> 6bit: 0.33% >> 7bit: 0.29% >> 8bit: 0.27% >> 9bit: 0.21% >> 10bit: 6.34% >> 11bit: 19.07% >> 12bit: 54.80% >> >> What I would say: increasing the number of bits from 6 to 9 does not >> affect distribution "uniformity", reaching the tenth bit results in >> sudden increase in the difference measure - the more bits, the more >> difference is observed. Distribution shape analysis for the 10th bit >> shows non-linear function. Lack of "randomness" in the quntile >> difference curve - chart shows completely lack of noise (pure >> functional relation). These are very strong indicators that starting >> from 10th bit distribution was changed and is no longer uniform. >> >> To formally confirm above conclusion for i.e. 5% significance level, >> which means that confidence level is 95%, I need some extra data >> regarding sample sizes. Please pass to me number of collected >> observations in each 6-12 bit experiment. > > Total number of observations was 162833. > Ok, finally I have some formal results. To be completely honest I need to point out that, in fact, we have a discrete data (for example integers 0, 1, ..., 63, but not continues numbers spread across 0 and 63). That is way I am going to use two sample Kolmogorov-Smirnov test. Methodology is simple: - Pawel’s data will be called empirical one - Theoretical data will be generated as a sequence of unique integer numbers from 0 to 2**n -1, where n is the number of bits. Assumption - each number appears in theoretical data only once representing ideal uniform distribution. Calculations will be done in the R-cran package Loading empirical data form files: > e6 = read.table("E:\\pawel\\dhr2_6bit_sorted.txt") > e7 = read.table("E:\\pawel\\dhr2_7bit_sorted.txt") > e8 = read.table("E:\\pawel\\dhr2_8bit_sorted.txt") > e9 = read.table("E:\\pawel\\dhr2_9bit_sorted.txt") > e10 = read.table("E:\\pawel\\dhr2_10bit_sorted.txt") > e11 = read.table("E:\\pawel\\dhr2_11bit_sorted.txt") > e12 = read.table("E:\\pawel\\dhr2_12bit_sorted.txt") Generating ideal theoretical data: > t6 = c(0:(2**6-1)) > t7 = c(0:(2**7-1)) > t8 = c(0:(2**8-1)) > t9 = c(0:(2**9-1)) > t10 = c(0:(2**10-1)) > t11 = c(0:(2**11-1)) > t12 = c(0:(2**12-1)) Performing KS tests: > ks.test(e6, t6) D = 0.0032, p-value = 1 > ks.test(e7, t7) D = 0.0029, p-value = 1 > ks.test(e8, t8) D = 0.0027, p-value = 1 > ks.test(e9, t9) D = 0.0022, p-value = 1 > ks.test(e10, t10) D = 0.0634, p-value = 0.0005562 > ks.test(e11, t11) D = 0.1907, p-value < 2.2e-16 > ks.test(e12, t12) D = 0.5479, p-value < 2.2e-16 As you can see D-statistics are almost the same as calculated by Pawel (considering roundings). P-values are very interesting due to very high number of observations generated by Pawel. Between 6 bits and 9 bits estimated p-values are equal to 1, so it means that it is impossible (at any significance level) to reject null hypothesis stating that compared distributions are equal. Final conclusion: it has to be random, and for sure it is random! Additionally starting form 10 bits we can observe dramatic decrease of p-value (from 100% to c.a. 0,06% and much less for the 11-12 bits). So low p-value means that it is impossible not to reject null hypothesis stating that compared distributions are equal. Final conclusion: it cannot be random, and for sure it is not random. I did the same comparison for the previous real device attach data (2081 obs.). R code and the results are below: > e16 = read.table("E:\\pawel\\device_attach_16bit.log") > t16 = c(0:(2**16-1)) > ks.test(e16, t16) D = 0.0178, p-value = 0.5422 Again, D-statistic an p-value are almost the same as previously calculated "manually". P-value is very high (it is not as high as in the 6-12 bits tests, but consider much lower number of observations: 2081 vs 162833), giving almost sureness that you have captured real 16-bits entropy! Regards, Mariusz
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5060D723.6020305>