Date: Mon, 6 Sep 2010 12:04:37 -0700 From: Chip Camden <sterling@camdensoftware.com> To: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: Re: PDF to HTML translations Message-ID: <20100906190437.GB26054@libertas.local.camdensoftware.com> In-Reply-To: <20100906184802.GC28608@guilt.hydra> References: <20100904230920.GA20735@guilt.hydra> <20100905065711.GA34993@slackbox.erewhon.net> <20100905083154.GA89704@owl.midgard.homeip.net> <20100906184802.GC28608@guilt.hydra>
next in thread | previous in thread | raw e-mail | index | archive | help
--LpQ9ahxlCli8rRTG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Chad Perrin on Monday, 06 September 2010: > On Sun, Sep 05, 2010 at 10:31:54AM +0200, Erik Trulsson wrote: > > On Sun, Sep 05, 2010 at 08:57:11AM +0200, Roland Smith wrote: > > > On Sat, Sep 04, 2010 at 05:09:20PM -0600, Chad Perrin wrote: > > > > What PDF to HTML translators, other than pdftohtml, am I likely to = be > > > > able to find in ports? I went looking for pdf2html, expecting to f= ind > > > > that there, but no luck. Before I spend hours sifting through, sti= ll > > > > without knowing whether I missed something that should be obvious,= =20 > > >=20 > > > Yes, you did. :-) >=20 > Apparently not. See below. >=20 >=20 > > >=20 > > > > I > > > > figured I'd ask here whether anyone knows of something off the top = of > > > > his/her head. > > >=20 > > > Try textproc/pdftohtml=20 > >=20 > > Uhm, he said "other than pdftohtml" so I suspect he already knew about > > that one. >=20 > This is indeed the case. >=20 > I appreciate the several suggestions I've received, though I see in > retrospect that I haven't been sufficiently specific, since I have not > gotten any suitable answers. >=20 > I have "inherited" a Perl script that wraps pdftohtml. The reason a > wrapper is needed is that a substantial amount of cleanup work is needed > to produce HTML suitable to our final needs. The output of pdftohtml is > sufficiently far from "perfect" that I would like to test the output of a > few other possible "back ends" for the script to see if a significant > amount of work being done by the script can be eliminated. >=20 > Toward that end, the simpler the tool the better -- and the tool on the > "back end" should not be something that must be contacted across a > network, or that cannot be redistributed freely. I wanted to start with > things I have in the base system on my FreeBSD laptop (where I'm doing my > development) or through ports. OpenOffice.org is quite a bit larger and > more unwieldy than we would really want to deal with at this point. > Using Google or Adobe tools online is well outside the range of what we > need (requiring network access for the tool to work). >=20 > I've started looking at the Xpdf tools as well as pdftohtml. Other > suggestions from within ports would be appreciated. Additional options > other than what can be found in ports might also be useful, understanding > the needs I sketched out above. The script itself is Perl, in case that > matters. >=20 > To everyone who has replied so far: thank you for your time. >=20 > --=20 > Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ] How about print/p5-PDFLib and print/pecl-pdflib to roll your own? Maybe that's more work than you wanted. --=20 Sterling (Chip) Camden | sterling@camdensoftware.com | 2048D/3A978E4F http://camdensoftware.com | http://chipstips.com | http://chipsquips= .com --LpQ9ahxlCli8rRTG Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iQEcBAEBAgAGBQJMhTtEAAoJEIpckszW26+RVKoH/jVEgYohW5uY8QzVcxD4hKM4 EAC7Cvy+KVb++6sJTY9YGPJFPfhZjeMdfPaQXPk4JHdi1FHlcr2NGAYZNy8oelOo XJWgAAjN22jJFen3Y2UK+3Z2TH+0ZEEaB4TniSkDlAQob5xUz6gBnL1cnOZxoI0z h32kNmGuMj2YU6kwcl3hFEANhaEox9L10Cu/csYc6AbTts6e8sVUhBs5i8EVb3r+ APhAR7AqqS8WyJr+R9ABl9L3yXdHJYbAXS75aebEt9Mmbz0G7JbBNou7L93E9L7Z c8oU0KuMVR07VJ0NZA9hLAtTwWOJJsHTzK9WJpFinDpsua1d1kPY3RKwxAWHSJc= =oRb/ -----END PGP SIGNATURE----- --LpQ9ahxlCli8rRTG--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100906190437.GB26054>