From owner-freebsd-questions@FreeBSD.ORG Tue Aug 9 13:40:28 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 14E45106566B for ; Tue, 9 Aug 2011 13:40:28 +0000 (UTC) (envelope-from rodperson@rodperson.com) Received: from www6.pairlite.com (www6.pairlite.com [64.130.10.16]) by mx1.freebsd.org (Postfix) with ESMTP id E87A58FC15 for ; Tue, 9 Aug 2011 13:40:27 +0000 (UTC) Received: from CCBH-194.acct.upmchs.net (inetnar10x.ft28.upmc.edu [128.147.28.1]) by www6.pairlite.com (Postfix) with ESMTPSA id 1220CB803; Tue, 9 Aug 2011 09:40:27 -0400 (EDT) Date: Tue, 9 Aug 2011 09:40:26 -0400 From: Rod Person To: Anton Shterenlikht Message-Id: <20110809094026.dea10d7a.rodperson@rodperson.com> In-Reply-To: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> References: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> X-Mailer: Sylpheed 3.1.1 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-questions@freebsd.org Subject: Re: extracting text from docx files X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2011 13:40:28 -0000 On Tue, 9 Aug 2011 14:36:32 +0100 Anton Shterenlikht wrote: > Usually I unzip a docx and then search > through all *xml files to find the > useful data. However, I can't find any > xml styles to use, so I have to convert > the relevant xml file(s) to plain text > by hand. I wonder if anybody can suggest > a better way. Perhaps there's something > in ports that can help. You could try this for just plain text conversion http://docx2txt.sourceforge.net/ -- Rod