Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Sep 2012 08:06:52 -0700
From:      David O'Brien <obrien@FreeBSD.org>
To:        Dag-Erling =?unknown-8bit?B?U23DuHJncmF2?= <des@des.no>
Cc:        Arthur Mesh <arthurmesh@gmail.com>, Ian Lepore <freebsd@damnhippie.dyndns.org>, Doug Barton <dougb@FreeBSD.org>, freebsd-rc@freebsd.org, freebsd-security@freebsd.org, RW <rwmaillists@googlemail.com>
Subject:   Re: svn commit: r239569 - head/etc/rc.d
Message-ID:  <20120911150652.GA83749@dragon.NUXI.org>
In-Reply-To: <86sjao7q8c.fsf@ds4.des.no>
References:  <20120903214638.GO1464@x96.org> <50453686.9090100@FreeBSD.org> <20120904220754.GA3643@server.rulingia.com> <20120906174247.GB13179@dragon.NUXI.org> <20120906230157.5307a21f@gumby.homeunix.com> <20120906224703.GD89120@x96.org> <50493480.8060307@FreeBSD.org> <20120911061530.GA77399@dragon.NUXI.org> <504EDC67.9070700@FreeBSD.org> <86sjao7q8c.fsf@ds4.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 11, 2012 at 01:28:51PM +0200, Dag-Erling Smrgrav wrote:
> My gut feeling is that compression is better
> than hashing for that purpose,

An related interesting thing -- in
http://www.cs.auckland.ac.nz/~pgut001/pubs/usenix98.pdf '5. Randomness
Polling Results', Peter Gutmann states

    The field of data compression provides us with a number of analysis
    tools which can be used to provide reasonable estimates of the change
    in entropy from one pool to another.  The tools we apply to this task
    are an LZ77 dictionary compressor (which looks for portions of the
    current data which match previously-seen data) and a powerful
    statistical compressor (which estimates the probability of occurrence
    of a symbol based on previously-seen symbols).[23]

    [23] "Practical Dictionary/Arithmetic Data Compression Synthesis",
    Peter Gutmann, MSc thesis, University of Auckland, 1992.

The paper goes into more depth and background on using compression as a
means to estimate entropy.

One of the Gray Beards at work was familiar with using LZ77 for this
purpose.  It has fallen out of favor, but he still felt it was useful
for the type of discussion we're having.

I don't have a pure LZ77 compressor, but if we take InfoZip's
modified-LZ77 deflation algorithm as suitable:

    # zip -v -Z deflate /tmp/e.zip /entropy
      adding: entropy   (in=4096) (out=4096) (stored 0%)
    total bytes=4096, compressed=4096 -> 0% savings

    # zip -v -Z deflate /tmp/e.zip /var/db/entropy/saved-entropy*
      adding: var/db/entropy/saved-entropy.1   (in=2048) (out=2048) (stored 0%)
      adding: var/db/entropy/saved-entropy.2   (in=2048) (out=2048) (stored 0%)
      adding: var/db/entropy/saved-entropy.3   (in=2048) (out=2048) (stored 0%)
      adding: var/db/entropy/saved-entropy.4   (in=2048) (out=2048) (stored 0%)
      adding: var/db/entropy/saved-entropy.5   (in=2048) (out=2048) (stored 0%)
      adding: var/db/entropy/saved-entropy.6   (in=2048) (out=2048) (stored 0%)
      adding: var/db/entropy/saved-entropy.7   (in=2048) (out=2048) (stored 0%)
      adding: var/db/entropy/saved-entropy.8   (in=2048) (out=2048) (stored 0%)
    total bytes=16384, compressed=16384 -> 0% savings

    # zip -v -Z deflate /tmp/e.zip out-sysctl-a
      adding: out-sysctl-a   (in=98772) (out=21703) (deflated 78%)
    total bytes=98772, compressed=21703 -> 78% savings

    # zip -v -Z deflate /tmp/e.zip out-dmesg
      adding: out-dmesg   (in=8727) (out=3394) (deflated 61%)
    total bytes=107499, compressed=25097 -> 77% savings

    # zip -v -Z deflate /tmp/e.zip out-kenv
      adding: out-kenv   (in=2011) (out=751) (deflated 63%)
    total bytes=109510, compressed=25848 -> 76% savings

    # zip -v -Z deflate /tmp/e.zip out-df-ib
      adding: out-df-ib   (in=234) (out=151) (deflated 35%)
    total bytes=234, compressed=151 -> 35% savings

    # zip -v -Z deflate /tmp/e.zip out-ps-fauxrH-o
      adding: out-ps-fauxrH-o   (in=1608) (out=464) (deflated 71%)
    total bytes=1608, compressed=464 -> 71% savings

    # zip -v -Z deflate /tmp/e.zip `sysctl -n kern.bootfile`
      adding: boot/kernel/kernel   (in=19021393) (out=8238497) (deflated 57%)
    total bytes=19021393, compressed=8238497 -> 57% savings

    # zip -v -Z deflate /tmp/e.zip /bin/ls
      adding: bin/ls    (in=97188) (out=37651) (deflated 61%)
    total bytes=97188, compressed=37651 -> 61% savings


> but at this point I'd be more comfortable
> if someone with an academic background in either cryptography or
> statistics (cperciva@?) weighed in.

This stuff can be tricky.  I'd also love to know cperciva thoughts.

-- 
-- David  (obrien@FreeBSD.org)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120911150652.GA83749>