Date: Wed, 15 Mar 1995 00:55:38 +0100 (MET) From: J Wunsch <j@uriah.heep.sax.de> To: freebsd-hackers@FreeBSD.org (FreeBSD hackers) Subject: Re: SCSI ASC-ASCQ descriptions Message-ID: <199503142355.AAA01220@uriah.heep.sax.de> In-Reply-To: <199503142039.PAA00285@hda.com> from "Peter Dufault" at Mar 14, 95 03:39:44 pm
next in thread | previous in thread | raw e-mail | index | archive | help
As Peter Dufault wrote: > > > I'm really tempted to make a program to do this... :) > > Yes, I thought of that too. I even went through the effort of seeing > how many unique words there are (about 300). > > If you had a clever way of finding "good overlap" I think you > could cut the size in half or more. Well, in this case, even a rather simple compression scheme will do it. Find the most common words, and -- since they consist only of ASCII characters -- assign them ``abbrevations'' in the range of 0x80 and up. A short glance on the file /COPYRIGHT gave me (for all words that appear at least three times): $ perl -e 'while(<>) {foreach $word (split) {$sums{$word}++;}} > $xx = 0x80; > foreach $key (sort {$sums{$b} <=> $sums{$a}} (keys(%sums))) { > printf "$key => 0x%2x\n", $xx++ unless $sums{$key} <= 2; > }' < /COPYRIGHT the => 0x80 of => 0x81 and => 0x82 OR => 0x83 OF => 0x84 in => 0x85 => 0x86 following => 0x87 software => 0x88 University => 0x89 this => 0x8a THE => 0x8b The => 0x8c ANY => 0x8d are => 0x8e or => 0x8f AND => 0x90 IEEE => 0x91 by => 0x92 to => 0x93 with => 0x94 IN => 0x95 documentation => 0x96 In => 0x97 is => 0x98 documentation. => 0x99 California. => 0x9a from => 0x9b must => 0x9c Regents => 0x9d copyright => 0x9e portions => 0x9f conditions => 0xa0 All => 0xa1 This is a quick hack only -- i didn't make any attempt to optimize or such, and note also the ``null'' word (0x86). I remember that Turbo Pascal V 2 and 3 used a similiar scheme for their error messages... -- cheers, J"org joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ Never trust an operating system you don't have sources for. ;-)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199503142355.AAA01220>