From owner-freebsd-hackers Tue Mar 14 16:24:44 1995 Return-Path: hackers-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id QAA02100 for hackers-outgoing; Tue, 14 Mar 1995 16:24:44 -0800 Received: from irz301.inf.tu-dresden.de (irz301.inf.tu-dresden.de [141.76.1.11]) by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id QAA02089 for ; Tue, 14 Mar 1995 16:24:06 -0800 Received: from sax.sax.de by irz301.inf.tu-dresden.de with SMTP (5.67b+/DEC-Ultrix/4.3) id AA13115; Wed, 15 Mar 1995 01:22:20 +0100 Received: by sax.sax.de (8.6.9/8.6.9-s1) with UUCP id BAA09086 for freebsd-hackers@freebsd.org; Wed, 15 Mar 1995 01:22:20 +0100 Received: (from j@localhost) by uriah.heep.sax.de (8.6.11/8.6.9) id AAA01220 for freebsd-hackers@freebsd.org; Wed, 15 Mar 1995 00:55:39 +0100 From: J Wunsch Message-Id: <199503142355.AAA01220@uriah.heep.sax.de> Subject: Re: SCSI ASC-ASCQ descriptions To: freebsd-hackers@FreeBSD.org (FreeBSD hackers) Date: Wed, 15 Mar 1995 00:55:38 +0100 (MET) In-Reply-To: <199503142039.PAA00285@hda.com> from "Peter Dufault" at Mar 14, 95 03:39:44 pm Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch) X-Phone: +49-351-2012 669 X-Mailer: ELM [version 2.4 PL23] Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Content-Length: 1696 Sender: hackers-owner@FreeBSD.org Precedence: bulk As Peter Dufault wrote: > > > I'm really tempted to make a program to do this... :) > > Yes, I thought of that too. I even went through the effort of seeing > how many unique words there are (about 300). > > If you had a clever way of finding "good overlap" I think you > could cut the size in half or more. Well, in this case, even a rather simple compression scheme will do it. Find the most common words, and -- since they consist only of ASCII characters -- assign them ``abbrevations'' in the range of 0x80 and up. A short glance on the file /COPYRIGHT gave me (for all words that appear at least three times): $ perl -e 'while(<>) {foreach $word (split) {$sums{$word}++;}} > $xx = 0x80; > foreach $key (sort {$sums{$b} <=> $sums{$a}} (keys(%sums))) { > printf "$key => 0x%2x\n", $xx++ unless $sums{$key} <= 2; > }' < /COPYRIGHT the => 0x80 of => 0x81 and => 0x82 OR => 0x83 OF => 0x84 in => 0x85 => 0x86 following => 0x87 software => 0x88 University => 0x89 this => 0x8a THE => 0x8b The => 0x8c ANY => 0x8d are => 0x8e or => 0x8f AND => 0x90 IEEE => 0x91 by => 0x92 to => 0x93 with => 0x94 IN => 0x95 documentation => 0x96 In => 0x97 is => 0x98 documentation. => 0x99 California. => 0x9a from => 0x9b must => 0x9c Regents => 0x9d copyright => 0x9e portions => 0x9f conditions => 0xa0 All => 0xa1 This is a quick hack only -- i didn't make any attempt to optimize or such, and note also the ``null'' word (0x86). I remember that Turbo Pascal V 2 and 3 used a similiar scheme for their error messages... -- cheers, J"org joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ Never trust an operating system you don't have sources for. ;-)