Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Jun 1996 23:23:05 -0700 (PDT)
From:      Bryan Ogawa at Work <bogawa@netvoyage.net>
To:        Sean Kelly <kelly@fsl.noaa.gov>
Cc:        tcg@ime.net, questions@freebsd.org
Subject:   Re: Postscript conversion
Message-ID:  <Pine.BSI.3.93.960603231421.1943E-100000@digital.netvoyage.net>
In-Reply-To: <199606040410.EAA06403@gatekeeper.fsl.noaa.gov>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 3 Jun 1996, Sean Kelly wrote:

> >>>>> "Gary" == Gary Chrysler <tcg@ime.net> writes:
> 
>     Gary> So is there any other way I can convert postscript files to
>     Gary> ascii?  I would really like to read the socks manual!
> 
> Converting PostScript to ASCII is three factors of magnitude (tm)
> harder than the halting problem!  And it's NP-complete, too!  :-)

:) :) :)

> Seriously, in general it can't be done.  After all, if you've got
> PostScript code that draws out each letter through a thousand or so
> moveto/lineto/arcto sequences, on paper it may look fine, but there's
> little hope of extracting just the text out of that.

On the other hand, there exists a (supposedly) much improved ps2ascii
converter out there, by the name of pstotext, which came out of DEC's
Virtual Paper project.  I can speak from experience that the previous ps
to plain text utilities were spectacularly bad, and as the postscript FAQ
explains, it's hard to get it right.

It's at:

	http://www.research.digital.com/SRC/virtualpaper/pstotext.html

Although I haven't tried pstotext out, I find the following quote from a
message a positive sign:

    We've tested pstotext on millions of lines of PostScript, including
    files generated by several versions of drivers from each of Windows,
    Macintosh, and dvips (TeX). It deals successfully with a wide variety
    of encoding vectors, and it re-assembles words that have been broken
    up for pair-kerning (it doesn't re-assemble words that have been
    hyphenated, though). It also works (though a little less reliably) on
    Acrobat PDF files. 

You'll need Aladdin postscript 3.33 / 3.51 or later to run it, apparently.

The reason I mention this here (besides trying to be helpful :) ) is that
I stumbled across this in a search for a better ps2ascii converter (as I
mentioned, the ones I had seen before were spectacularly poor at it).
Since it's not mentioned in any of the PS faqs I know of, and not easily
findable from search engines (even altavista, which is surprising since
it's digital), I thought I'd mention it and let other people know about
it, if only to find out it's still insufficient for the job. :)

bryan

> 
> Knowing what produced the PostScript code can be a big help, though.
> Some programs exist that recognize the PostScript produced by various
> document packages and wade its way through the font changes and kerns,
> revealing plain old text.
> 
> And yes, Ghostscript is your friend. :-)
> 
> Seriously, your best bet is to install Ghostscript.  Right, I hear you
> ... you don't wanna install X windows ... after all, Marcus J Ranum of
> DEC said:
> 
> 	If the designers of X Windows built cars, there would be no
> 	fewer than five steering whells hidden about the cockpit, none
> 	of which followed the same principles---but you'd be able to
> 	shift gears with your car stereo.  Useful feature, that.
> 
> So, you'll be happy to note that Ghostscript doesn't need X windows!
> Just avoid the copy that's in the ports collection (which I'm assuming
> is configured for X by default) and build and install it yourself.  In
> fact, I've made sure that ``The Professor'' knows that it worked
> out-of-the-box on FreeBSD ... that was back in version 3.33, and I'm
> sure it's still true today in version 3.53.
> 
> So, grab these files:
> 
>   ftp://ftp.cs.wisc.edu/pub/ghost/aladdin/ghostscript-3.53.tar.gz
>   ftp://ftp.cs.wisc.edu/pub/ghost/aladdin/ghostscript-3.53jpeg.tar.gz
> 
> And in the makefile, explictly leave OUT the X windows stuff!  The
> README and make.doc files will certainly provide you with more hints.
> 
> Once you've got it built and installed, you'll have a new script to
> play with: /usr/local/bin/ps2ascii, which uses gs to extract text out
> of PS files.
> 
> GOOD LUCK!
> 
> -- 
> Sean Kelly                          
> NOAA Forecast Systems Laboratory    kelly@fsl.noaa.gov
> Boulder Colorado USA                http://www-sdd.fsl.noaa.gov/~kelly/
> 

Bryan K. Ogawa
Questions or Problems with NetVoyage?  help@netvoyage.net
Check out the NetVoyage HelpWeb at..   <URL: http://www.netvoyage.net/~help/>;




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSI.3.93.960603231421.1943E-100000>