Date: Mon, 6 Sep 2010 12:48:02 -0600 From: Chad Perrin <perrin@apotheon.com> To: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: PDF to HTML translations Message-ID: <20100906184802.GC28608@guilt.hydra> In-Reply-To: <20100905083154.GA89704@owl.midgard.homeip.net> References: <20100904230920.GA20735@guilt.hydra> <20100905065711.GA34993@slackbox.erewhon.net> <20100905083154.GA89704@owl.midgard.homeip.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--Sr1nOIr3CvdE5hEN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Sep 05, 2010 at 10:31:54AM +0200, Erik Trulsson wrote: > On Sun, Sep 05, 2010 at 08:57:11AM +0200, Roland Smith wrote: > > On Sat, Sep 04, 2010 at 05:09:20PM -0600, Chad Perrin wrote: > > > What PDF to HTML translators, other than pdftohtml, am I likely to be > > > able to find in ports? I went looking for pdf2html, expecting to find > > > that there, but no luck. Before I spend hours sifting through, still > > > without knowing whether I missed something that should be obvious,=20 > >=20 > > Yes, you did. :-) Apparently not. See below. > >=20 > > > I > > > figured I'd ask here whether anyone knows of something off the top of > > > his/her head. > >=20 > > Try textproc/pdftohtml=20 >=20 > Uhm, he said "other than pdftohtml" so I suspect he already knew about > that one. This is indeed the case. I appreciate the several suggestions I've received, though I see in retrospect that I haven't been sufficiently specific, since I have not gotten any suitable answers. I have "inherited" a Perl script that wraps pdftohtml. The reason a wrapper is needed is that a substantial amount of cleanup work is needed to produce HTML suitable to our final needs. The output of pdftohtml is sufficiently far from "perfect" that I would like to test the output of a few other possible "back ends" for the script to see if a significant amount of work being done by the script can be eliminated. Toward that end, the simpler the tool the better -- and the tool on the "back end" should not be something that must be contacted across a network, or that cannot be redistributed freely. I wanted to start with things I have in the base system on my FreeBSD laptop (where I'm doing my development) or through ports. OpenOffice.org is quite a bit larger and more unwieldy than we would really want to deal with at this point. Using Google or Adobe tools online is well outside the range of what we need (requiring network access for the tool to work). I've started looking at the Xpdf tools as well as pdftohtml. Other suggestions from within ports would be appreciated. Additional options other than what can be found in ports might also be useful, understanding the needs I sketched out above. The script itself is Perl, in case that matters. To everyone who has replied so far: thank you for your time. --=20 Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ] --Sr1nOIr3CvdE5hEN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkyFN2IACgkQ9mn/Pj01uKX5yACgkFj+xVrwM/kyInSMNHcAKCtU NnAAn29tz1SPeY6V1jH+rD5uwZCswHvP =30/w -----END PGP SIGNATURE----- --Sr1nOIr3CvdE5hEN--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100906184802.GC28608>