From owner-freebsd-questions Wed May 16 12:14:52 2001 Delivered-To: freebsd-questions@freebsd.org Received: from relay3.inwind.it (relay3.inwind.it [212.141.53.74]) by hub.freebsd.org (Postfix) with ESMTP id EAF8C37B423 for ; Wed, 16 May 2001 12:14:42 -0700 (PDT) (envelope-from bartequi@inwind.it) Received: from bartequi.ottodomain.org (62.98.162.51) by relay3.inwind.it (5.5.029) id 3AE401CE00595598 for freebsd-questions@FreeBSD.ORG; Wed, 16 May 2001 21:14:36 +0200 From: Salvo Bartolotta Date: Wed, 16 May 2001 19:16:55 GMT Message-ID: <20010516.19165500@bartequi.ottodomain.org> Subject: Re: Manipulating pdf/ps files -- closer to a solution To: freebsd-questions@FreeBSD.ORG References: <20010513.18294500@bartequi.ottodomain.org> <20010515.1075700@bartequi.ottodomain.org> X-Mailer: SuperCalifragilis X-Priority: 3 (Normal) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<< On 5/15/01, 3:07:57 AM, Salvo Bartolotta wrote regarding Re: Manipulating pdf/ps files: > >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<< > On 5/13/01, 8:29:45 PM, Salvo Bartolotta wrote > regarding Manipulating pdf/ps files: > > Dear FreeBSD'ers, > > I would like to perform such operations as the following: > > -- merge PDF/ps files > > -- modify PDF/ps files in a more or less "graphical" (read: > > human-understandable) fashion > > -- convert PDF/ps files to other formats (eg text). > > Browsing the archives, I learnt about pdf2ps, ps2pdf, pstotext and > > psutils (both in the ports). I had also browsed the ports tree as we= ll as > > the Doc-primer, but I am probably missing something trivial here. > > I have found some difficulties: eg, psmerge seems not to work on a f= ew ps > > files, which files I downloaded (originally as PDF files) from a www= > > site. I have reason to believe those files were generated from one m= ain > > file (containing data arranged in a table) split into several pieces= , > > BTW. I couldn't convert the ps files to txt, either: pstotext genera= ted > > strings of hashes (the "#" character). > I meet with problems when trying to convert PDF/ps files containing da= ta > arranged in a table, each raw of data being preceded as well as follow= ed > by a (continuous) horizontal line like this (the data were probably > formatted with M$ excel): > ------------------------------------------- > data data data... > ------------------------------------------- > data data data... > ------------------------------------------- > For example, running pdfinfo on one of the files spits out: > Creator: Windows NT 4.0 > Producer: Acrobat Distiller 4.0 for Windows > CreationDate: 20010511130351 > ModDate: 20010511130351+02'00' > Pages: 60 > Encrypted: no > Linearized: yes > I tried xpdf (in the ports), namely pdftotext, but it didn't work. > Summing up: I can convert those PDF files into ps, the information in = the > ps files IS displayed correctly, but I have managed to convert neither= > the above-mentioned PDF nor ps files into plain text. There is a txt2p= df > utility on the Net, but I can't seem to find a **working** pdf2txt or > ps2txt one. BTW, the "clipboard" (ie the mouse middle button) DOES cop= y > from Acrobat Reader (running in linux comp. layer) to other text edito= rs > within X, but it copies (raw) PDF data. To whomever it may concern, I keep replying to myself, but I seem to have made some progress. I had successfully converted the PDF files into ps ones. The reason why pstotext didn't work is probably that such files (eg a 60-page PDF file)= are **images** (in the preceding example: a collection of 60 images, one= per page, as was pointed out by ImageMagick). Which is also the reason why pdftotext didn't work, BTW. Since I had to deal with PDF "images", not "text" PDF files... I asked (wait for it) ImageMagick for help :-) convert DID work, and created a collection of jpeg images (one per page). Thus, I can convert PDF "images", or data acquired/manipulated/treated as such -- specifically, a M$ Excel table -- into other image formats. AAARGH! I am only missing the last step: how to recover text from eg suc= h jpeg images; and/or... which image format to choose in order to be able to extract text from the images. Once again, I would very much like to work under FreeBSD, and NOT make use of any M$-related product; the negation "NOT" extending from the coasts of Western Europe to the Pacific coasts of USA -- just to make sure that M$ is within the scope of negation :-))) MTIA, Salvo (with apologies for the dumb question) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message