Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Oct 2010 19:04:50 -0500 (CDT)
From:      Robert Bonomi <bonomi@mail.r-bonomi.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: Is there any way of transfering my excellent PDF file into plain HTML
Message-ID:  <201010270004.o9R04o1Y004753@mail.r-bonomi.com>

next in thread | raw e-mail | index | archive | help
  
> From owner-freebsd-questions@freebsd.org  Tue Oct 26 13:28:24 2010
> Date: Tue, 26 Oct 2010 11:30:01 -0700
> From: Gary Kline <kline@thought.org>
> To: FreeBSD Mailing List <freebsd-questions@freebsd.org>
> Cc: 
> Subject: Is there any way of transfering my excellent PDF file into plain
>  HTML
>
>
>
> One thing that Linux misses--or seems to--is all the conversion
> programs that go from one format to another.  I _was_ able to use
> abiread to get a PDF text into an obscure HTML, but hundreds of
> paragraphs get broken up.  So: is there any conversion program to
> do it *right*?

Authoritative answer: "maybe".

This is one of those things where there's no subsitute for a trained eyeball.


Depending on _how_ the PDF was generated, thee can be things in it that 
'look like' breaks to a mechanical parser, but don't appear that way 
on the page.

It's -really- hard for a parser to tell a 'near no-op' from a 'something'
that does something 'significant'.

Maybe Ghostscript's "pdf2ps", followed by "ps2ascii"; then wrap it in minimal
HTML framing that simply declares it to be a '<pre>' block.







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201010270004.o9R04o1Y004753>