Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Mar 1995 00:55:38 +0100 (MET)
From:      J Wunsch <j@uriah.heep.sax.de>
To:        freebsd-hackers@FreeBSD.org (FreeBSD hackers)
Subject:   Re: SCSI ASC-ASCQ descriptions
Message-ID:  <199503142355.AAA01220@uriah.heep.sax.de>
In-Reply-To: <199503142039.PAA00285@hda.com> from "Peter Dufault" at Mar 14, 95 03:39:44 pm

next in thread | previous in thread | raw e-mail | index | archive | help
As Peter Dufault wrote:
> 
> > I'm really  tempted to make a program to do this... :)
> 
> Yes, I thought of that too.  I even went through the effort of seeing
> how many unique words there are (about 300).
> 
> If you had a clever way of finding "good overlap" I think you
> could cut the size in half or more.

Well, in this case, even a rather simple compression scheme will do
it.  Find the most common words, and -- since they consist only of
ASCII characters -- assign them ``abbrevations'' in the range of 0x80
and up.

A short glance on the file /COPYRIGHT gave me (for all words that
appear at least three times):


$ perl -e 'while(<>) {foreach $word (split) {$sums{$word}++;}}
>       $xx = 0x80;
>       foreach $key (sort {$sums{$b} <=> $sums{$a}} (keys(%sums))) {
>               printf "$key => 0x%2x\n", $xx++ unless $sums{$key} <= 2;
>       }' < /COPYRIGHT
the => 0x80
of => 0x81
and => 0x82
OR => 0x83
OF => 0x84
in => 0x85
 => 0x86
following => 0x87
software => 0x88
University => 0x89
this => 0x8a
THE => 0x8b
The => 0x8c
ANY => 0x8d
are => 0x8e
or => 0x8f
AND => 0x90
IEEE => 0x91
by => 0x92
to => 0x93
with => 0x94
IN => 0x95
documentation => 0x96
In => 0x97
is => 0x98
documentation. => 0x99
California. => 0x9a
from => 0x9b
must => 0x9c
Regents => 0x9d
copyright => 0x9e
portions => 0x9f
conditions => 0xa0
All => 0xa1

This is a quick hack only -- i didn't make any attempt to optimize
or such, and note also the ``null'' word (0x86).

I remember that Turbo Pascal V 2 and 3 used a similiar scheme for
their error messages...
-- 
cheers, J"org

joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/
Never trust an operating system you don't have sources for. ;-)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199503142355.AAA01220>