Date: Wed, 16 May 2001 19:16:55 GMT From: Salvo Bartolotta <bartequi@inwind.it> To: freebsd-questions@FreeBSD.ORG Subject: Re: Manipulating pdf/ps files -- closer to a solution Message-ID: <20010516.19165500@bartequi.ottodomain.org> References: <20010513.18294500@bartequi.ottodomain.org> <20010515.1075700@bartequi.ottodomain.org>
next in thread | previous in thread | raw e-mail | index | archive | help
>>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<< On 5/15/01, 3:07:57 AM, Salvo Bartolotta <bartequi@inwind.it> wrote regarding Re: Manipulating pdf/ps files: > >>>>>>>>>>>>>>>>>> Original Message <<<<<<<<<<<<<<<<<< > On 5/13/01, 8:29:45 PM, Salvo Bartolotta <bartequi@inwind.it> wrote > regarding Manipulating pdf/ps files: > > Dear FreeBSD'ers, > > I would like to perform such operations as the following: > > -- merge PDF/ps files > > -- modify PDF/ps files in a more or less "graphical" (read: > > human-understandable) fashion > > -- convert PDF/ps files to other formats (eg text). > > Browsing the archives, I learnt about pdf2ps, ps2pdf, pstotext and > > psutils (both in the ports). I had also browsed the ports tree as we= ll as > > the Doc-primer, but I am probably missing something trivial here. > > I have found some difficulties: eg, psmerge seems not to work on a f= ew ps > > files, which files I downloaded (originally as PDF files) from a www= > > site. I have reason to believe those files were generated from one m= ain > > file (containing data arranged in a table) split into several pieces= , > > BTW. I couldn't convert the ps files to txt, either: pstotext genera= ted > > strings of hashes (the "#" character). > I meet with problems when trying to convert PDF/ps files containing da= ta > arranged in a table, each raw of data being preceded as well as follow= ed > by a (continuous) horizontal line like this (the data were probably > formatted with M$ excel): > ------------------------------------------- > data data data... > ------------------------------------------- > data data data... > ------------------------------------------- > For example, running pdfinfo on one of the files spits out: > Creator: Windows NT 4.0 > Producer: Acrobat Distiller 4.0 for Windows > CreationDate: 20010511130351 > ModDate: 20010511130351+02'00' > Pages: 60 > Encrypted: no > Linearized: yes > I tried xpdf (in the ports), namely pdftotext, but it didn't work. > Summing up: I can convert those PDF files into ps, the information in = the > ps files IS displayed correctly, but I have managed to convert neither= > the above-mentioned PDF nor ps files into plain text. There is a txt2p= df > utility on the Net, but I can't seem to find a **working** pdf2txt or > ps2txt one. BTW, the "clipboard" (ie the mouse middle button) DOES cop= y > from Acrobat Reader (running in linux comp. layer) to other text edito= rs > within X, but it copies (raw) PDF data. To whomever it may concern, I keep replying to myself, but I seem to have made some progress. I had successfully converted the PDF files into ps ones. The reason why pstotext didn't work is probably that such files (eg a 60-page PDF file)= are **images** (in the preceding example: a collection of 60 images, one= per page, as was pointed out by ImageMagick). Which is also the reason why pdftotext didn't work, BTW. Since I had to deal with PDF "images", not "text" PDF files... I asked (wait for it) ImageMagick for help :-) convert <name_of_PDF_file_of_type_"image"> <name...jpg> DID work, and created a collection of jpeg images (one per page). Thus, I can convert PDF "images", or data acquired/manipulated/treated as such -- specifically, a M$ Excel table -- into other image formats. <aside>pdftoimages does NOT seem to work, however</aside> <question type=3D"dumb"> AAARGH! I am only missing the last step: how to recover text from eg suc= h jpeg images; and/or... which image format to choose in order to be able to extract text from the images. </question> <advocacy> Once again, I would very much like to work under FreeBSD, and NOT make use of any M$-related product; the negation "NOT" extending from the coasts of Western Europe to the Pacific coasts of USA -- just to make sure that M$ is within the scope of negation :-))) MTIA, Salvo (with apologies for the dumb question) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010516.19165500>