From owner-freebsd-questions@FreeBSD.ORG Wed Jun 10 02:08:31 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB54E106566C for ; Wed, 10 Jun 2009 02:08:31 +0000 (UTC) (envelope-from on@cs.ait.ac.th) Received: from mail.cs.ait.ac.th (mail.cs.ait.ac.th [192.41.170.16]) by mx1.freebsd.org (Postfix) with ESMTP id 3D61C8FC27 for ; Wed, 10 Jun 2009 02:08:30 +0000 (UTC) (envelope-from on@cs.ait.ac.th) Received: from banyan.cs.ait.ac.th (banyan.cs.ait.ac.th [192.41.170.5]) by mail.cs.ait.ac.th (8.13.1/8.13.1) with ESMTP id n5A24BhK095468 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 10 Jun 2009 09:04:11 +0700 (ICT) (envelope-from on@cs.ait.ac.th) Received: (from on@localhost) by banyan.cs.ait.ac.th (8.14.3/8.14.3/Submit) id n5A28RYg062023; Wed, 10 Jun 2009 09:08:27 +0700 (ICT) (envelope-from on) Date: Wed, 10 Jun 2009 09:08:27 +0700 (ICT) Message-Id: <200906100208.n5A28RYg062023@banyan.cs.ait.ac.th> From: Olivier Nicole To: djuatdelta@gmail.com In-reply-to: (message from Daniel Underwood on Tue, 9 Jun 2009 13:18:56 -0400) References: <3D527043-AF88-4A26-8029-FD51159E6ABB@yahoo.fr> X-Virus-Scanned: on CSIM by amavisd-milter (http://www.amavis.org/) Cc: freebsd-questions@freebsd.org Subject: Re: PDF inventory software X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Jun 2009 02:08:32 -0000 Daniel, > I'm trying to convert all PDF files in a directory to text using > "pdftotext". I tried the following command: Aside from the syntax of the command find(1) and some article that may be in corrupted PDF, you may consider hacking pdftotext to skip the "do not print" flag in some of the PDF articles. I don't think that many scientific articles would set the flag that prevent from printing them. But some PDF filess have that flag set, and pdftotext would not work on them, unless you patch it (which is easy, could even be a compile option, I don't remember). Best regards, Olivier