From owner-freebsd-questions@FreeBSD.ORG Mon Sep 6 18:51:19 2010 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3BF2D1065695 for ; Mon, 6 Sep 2010 18:51:19 +0000 (UTC) (envelope-from perrin@apotheon.com) Received: from cpoproxy2-pub.bluehost.com (cpoproxy2-pub.bluehost.com [67.222.39.38]) by mx1.freebsd.org (Postfix) with SMTP id 040688FC0A for ; Mon, 6 Sep 2010 18:51:18 +0000 (UTC) Received: (qmail 2182 invoked by uid 0); 6 Sep 2010 18:51:18 -0000 Received: from unknown (HELO box543.bluehost.com) (74.220.219.143) by cpoproxy2.bluehost.com with SMTP; 6 Sep 2010 18:51:18 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=apotheon.com; h=Date:From:To:Subject:Message-ID:Mail-Followup-To:References:Mime-Version:Content-Type:Content-Disposition:In-Reply-To:User-Agent:X-Identified-User; b=C4tBJTIMQD2mUwzQUXhv5KVupNjDQuTfI1BBfKDuqBYg9uDIfxGBiBOvGcB/H3G/bxQFRLijwu6tc82Jx5Jhq28OfnT16Cm5jSzvBSMZ28h63fL6yNl/6tQRtBqXT388; Received: from c-24-8-180-234.hsd1.co.comcast.net ([24.8.180.234] helo=kukaburra.hydra) by box543.bluehost.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1Osgmj-0004r5-7l for freebsd-questions@freebsd.org; Mon, 06 Sep 2010 12:51:18 -0600 Received: by kukaburra.hydra (sSMTP sendmail emulation); Mon, 06 Sep 2010 12:48:02 -0600 Date: Mon, 6 Sep 2010 12:48:02 -0600 From: Chad Perrin To: FreeBSD Questions Message-ID: <20100906184802.GC28608@guilt.hydra> Mail-Followup-To: FreeBSD Questions References: <20100904230920.GA20735@guilt.hydra> <20100905065711.GA34993@slackbox.erewhon.net> <20100905083154.GA89704@owl.midgard.homeip.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Sr1nOIr3CvdE5hEN" Content-Disposition: inline In-Reply-To: <20100905083154.GA89704@owl.midgard.homeip.net> User-Agent: Mutt/1.4.2.3i X-Identified-User: {2737:box543.bluehost.com:apotheon:apotheon.org} {sentby:smtp auth 24.8.180.234 authed with ren@apotheon.org} Subject: Re: PDF to HTML translations X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Sep 2010 18:51:19 -0000 --Sr1nOIr3CvdE5hEN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Sep 05, 2010 at 10:31:54AM +0200, Erik Trulsson wrote: > On Sun, Sep 05, 2010 at 08:57:11AM +0200, Roland Smith wrote: > > On Sat, Sep 04, 2010 at 05:09:20PM -0600, Chad Perrin wrote: > > > What PDF to HTML translators, other than pdftohtml, am I likely to be > > > able to find in ports? I went looking for pdf2html, expecting to find > > > that there, but no luck. Before I spend hours sifting through, still > > > without knowing whether I missed something that should be obvious,=20 > >=20 > > Yes, you did. :-) Apparently not. See below. > >=20 > > > I > > > figured I'd ask here whether anyone knows of something off the top of > > > his/her head. > >=20 > > Try textproc/pdftohtml=20 >=20 > Uhm, he said "other than pdftohtml" so I suspect he already knew about > that one. This is indeed the case. I appreciate the several suggestions I've received, though I see in retrospect that I haven't been sufficiently specific, since I have not gotten any suitable answers. I have "inherited" a Perl script that wraps pdftohtml. The reason a wrapper is needed is that a substantial amount of cleanup work is needed to produce HTML suitable to our final needs. The output of pdftohtml is sufficiently far from "perfect" that I would like to test the output of a few other possible "back ends" for the script to see if a significant amount of work being done by the script can be eliminated. Toward that end, the simpler the tool the better -- and the tool on the "back end" should not be something that must be contacted across a network, or that cannot be redistributed freely. I wanted to start with things I have in the base system on my FreeBSD laptop (where I'm doing my development) or through ports. OpenOffice.org is quite a bit larger and more unwieldy than we would really want to deal with at this point. Using Google or Adobe tools online is well outside the range of what we need (requiring network access for the tool to work). I've started looking at the Xpdf tools as well as pdftohtml. Other suggestions from within ports would be appreciated. Additional options other than what can be found in ports might also be useful, understanding the needs I sketched out above. The script itself is Perl, in case that matters. To everyone who has replied so far: thank you for your time. --=20 Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ] --Sr1nOIr3CvdE5hEN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkyFN2IACgkQ9mn/Pj01uKX5yACgkFj+xVrwM/kyInSMNHcAKCtU NnAAn29tz1SPeY6V1jH+rD5uwZCswHvP =30/w -----END PGP SIGNATURE----- --Sr1nOIr3CvdE5hEN--