From owner-freebsd-questions@FreeBSD.ORG Mon Jan 26 23:51:33 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29E3810656F7 for ; Mon, 26 Jan 2009 23:51:33 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from mx01.qsc.de (mx01.qsc.de [213.148.129.14]) by mx1.freebsd.org (Postfix) with ESMTP id DB0D08FC1F for ; Mon, 26 Jan 2009 23:51:32 +0000 (UTC) (envelope-from freebsd@edvax.de) Received: from r55.edvax.de (port-92-196-68-197.dynamic.qsc.de [92.196.68.197]) by mx01.qsc.de (Postfix) with ESMTP id 89C853CB94; Tue, 27 Jan 2009 00:51:22 +0100 (CET) Received: from r55.edvax.de (localhost [127.0.0.1]) by r55.edvax.de (8.14.2/8.14.2) with SMTP id n0QNpGtv005883; Tue, 27 Jan 2009 00:51:16 +0100 (CET) (envelope-from freebsd@edvax.de) Date: Tue, 27 Jan 2009 00:51:16 +0100 From: Polytropon To: Gary Kline Message-Id: <20090127005116.0e977d27.freebsd@edvax.de> In-Reply-To: <20090126225113.GA78416@thought.org> References: <20090126001822.GA38314@thought.org> <20090126005156.GJ66858@comcast.net> <497D0FF3.6090402@telenix.org> <20090126080618.GA51983@thought.org> <20090126091623.a0b50f64.freebsd@edvax.de> <20090126213648.GL66858@comcast.net> <20090126225113.GA78416@thought.org> Organization: EDVAX X-Mailer: Sylpheed 2.4.7 (GTK+ 2.12.1; i386-portbld-freebsd7.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: FreeBSD Mailing List Subject: Re: can i split a pdf file? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Polytropon List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jan 2009 23:51:33 -0000 On Mon, 26 Jan 2009 14:51:14 -0800, Gary Kline wrote: > Still, > before I get back to the Last few pages of my thesis, maybe I'll > try feeding parts of my most vanilla image-PDF file to an > opensource OCR program. I'm pretty sure there are a couple in > ports. IIRC, though, the images have to be jpegs of tiffs or the > like. If anybody knows, please give me a shout out! The best idea is to use a format that does not have artifacts due to image compression through DCT or similar algorithms, read: "real black-white pictures" (1 bit color). JPEG is not such a format, you can see this by magnifying the surrounding of text: it is gray and looks "dusty". TIFF, GIF and PNG surely are better formats for feeding images into an OCR processor. (Background: Long time ago, I knew a man who did electronics and printed circuit boards. In order to save hard disk space, he converted his 1-bit BMP images of the schematics and the PCB layout to JPEG format - instead of just zipping, raring or arjing them. He was very unhappy to see them coming out of the printer "so dirty, partially unreadable" then allthough it was a high quality office class laser printer. And when he took the PCBs out of the acid bath, their previously photochemical treated surface looked strange, had holes in the copper, ready to be thrown away. This man was very upset when he was told about DCT and artifacts. Later on, he used GIF images and turned happy again.) -- Polytropon >From Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ...