From owner-freebsd-questions@FreeBSD.ORG Mon Sep 6 19:04:42 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 562BF1065672 for ; Mon, 6 Sep 2010 19:04:42 +0000 (UTC) (envelope-from sterling@camdensoftware.com) Received: from wh2.interactivevillages.com (wh2.interactivevillages.com [75.125.250.34]) by mx1.freebsd.org (Postfix) with ESMTP id 1D4918FC0A for ; Mon, 6 Sep 2010 19:04:41 +0000 (UTC) Received: from 174-21-101-5.tukw.qwest.net ([174.21.101.5] helo=_HOSTNAME_) by wh2.interactivevillages.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1Osgrd-000617-9Z for freebsd-questions@freebsd.org; Mon, 06 Sep 2010 11:56:22 -0700 Received: by _HOSTNAME_ (sSMTP sendmail emulation); Mon, 06 Sep 2010 12:04:37 -0700 Date: Mon, 6 Sep 2010 12:04:37 -0700 From: Chip Camden To: FreeBSD Questions Message-ID: <20100906190437.GB26054@libertas.local.camdensoftware.com> Mail-Followup-To: FreeBSD Questions References: <20100904230920.GA20735@guilt.hydra> <20100905065711.GA34993@slackbox.erewhon.net> <20100905083154.GA89704@owl.midgard.homeip.net> <20100906184802.GC28608@guilt.hydra> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="LpQ9ahxlCli8rRTG" Content-Disposition: inline In-Reply-To: <20100906184802.GC28608@guilt.hydra> User-Agent: Mutt/1.4.2.3i Company: Camden Software Consulting URL: http://camdensoftware.com X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh2.interactivevillages.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - camdensoftware.com Subject: Re: PDF to HTML translations X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Sep 2010 19:04:42 -0000 --LpQ9ahxlCli8rRTG Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoth Chad Perrin on Monday, 06 September 2010: > On Sun, Sep 05, 2010 at 10:31:54AM +0200, Erik Trulsson wrote: > > On Sun, Sep 05, 2010 at 08:57:11AM +0200, Roland Smith wrote: > > > On Sat, Sep 04, 2010 at 05:09:20PM -0600, Chad Perrin wrote: > > > > What PDF to HTML translators, other than pdftohtml, am I likely to = be > > > > able to find in ports? I went looking for pdf2html, expecting to f= ind > > > > that there, but no luck. Before I spend hours sifting through, sti= ll > > > > without knowing whether I missed something that should be obvious,= =20 > > >=20 > > > Yes, you did. :-) >=20 > Apparently not. See below. >=20 >=20 > > >=20 > > > > I > > > > figured I'd ask here whether anyone knows of something off the top = of > > > > his/her head. > > >=20 > > > Try textproc/pdftohtml=20 > >=20 > > Uhm, he said "other than pdftohtml" so I suspect he already knew about > > that one. >=20 > This is indeed the case. >=20 > I appreciate the several suggestions I've received, though I see in > retrospect that I haven't been sufficiently specific, since I have not > gotten any suitable answers. >=20 > I have "inherited" a Perl script that wraps pdftohtml. The reason a > wrapper is needed is that a substantial amount of cleanup work is needed > to produce HTML suitable to our final needs. The output of pdftohtml is > sufficiently far from "perfect" that I would like to test the output of a > few other possible "back ends" for the script to see if a significant > amount of work being done by the script can be eliminated. >=20 > Toward that end, the simpler the tool the better -- and the tool on the > "back end" should not be something that must be contacted across a > network, or that cannot be redistributed freely. I wanted to start with > things I have in the base system on my FreeBSD laptop (where I'm doing my > development) or through ports. OpenOffice.org is quite a bit larger and > more unwieldy than we would really want to deal with at this point. > Using Google or Adobe tools online is well outside the range of what we > need (requiring network access for the tool to work). >=20 > I've started looking at the Xpdf tools as well as pdftohtml. Other > suggestions from within ports would be appreciated. Additional options > other than what can be found in ports might also be useful, understanding > the needs I sketched out above. The script itself is Perl, in case that > matters. >=20 > To everyone who has replied so far: thank you for your time. >=20 > --=20 > Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ] How about print/p5-PDFLib and print/pecl-pdflib to roll your own? Maybe that's more work than you wanted. --=20 Sterling (Chip) Camden | sterling@camdensoftware.com | 2048D/3A978E4F http://camdensoftware.com | http://chipstips.com | http://chipsquips= .com --LpQ9ahxlCli8rRTG Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (FreeBSD) iQEcBAEBAgAGBQJMhTtEAAoJEIpckszW26+RVKoH/jVEgYohW5uY8QzVcxD4hKM4 EAC7Cvy+KVb++6sJTY9YGPJFPfhZjeMdfPaQXPk4JHdi1FHlcr2NGAYZNy8oelOo XJWgAAjN22jJFen3Y2UK+3Z2TH+0ZEEaB4TniSkDlAQob5xUz6gBnL1cnOZxoI0z h32kNmGuMj2YU6kwcl3hFEANhaEox9L10Cu/csYc6AbTts6e8sVUhBs5i8EVb3r+ APhAR7AqqS8WyJr+R9ABl9L3yXdHJYbAXS75aebEt9Mmbz0G7JbBNou7L93E9L7Z c8oU0KuMVR07VJ0NZA9hLAtTwWOJJsHTzK9WJpFinDpsua1d1kPY3RKwxAWHSJc= =oRb/ -----END PGP SIGNATURE----- --LpQ9ahxlCli8rRTG--