From owner-freebsd-questions@FreeBSD.ORG Tue Aug 9 14:39:50 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 064F5106564A for ; Tue, 9 Aug 2011 14:39:50 +0000 (UTC) (envelope-from mexas@bristol.ac.uk) Received: from dirj.bris.ac.uk (dirj.bris.ac.uk [137.222.10.78]) by mx1.freebsd.org (Postfix) with ESMTP id B66208FC15 for ; Tue, 9 Aug 2011 14:39:49 +0000 (UTC) Received: from ncsc.bris.ac.uk ([137.222.10.41]) by dirj.bris.ac.uk with esmtp (Exim 4.72) (envelope-from ) id 1QqnTA-0006qf-Mq; Tue, 09 Aug 2011 15:39:48 +0100 Received: from mech-cluster241.men.bris.ac.uk ([137.222.187.241]) by ncsc.bris.ac.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1QqnTA-0007Yp-Ac; Tue, 09 Aug 2011 15:39:48 +0100 Received: from mech-cluster241.men.bris.ac.uk (localhost [127.0.0.1]) by mech-cluster241.men.bris.ac.uk (8.14.4/8.14.4) with ESMTP id p79EdmOw039526; Tue, 9 Aug 2011 15:39:48 +0100 (BST) (envelope-from mexas@bristol.ac.uk) Received: (from mexas@localhost) by mech-cluster241.men.bris.ac.uk (8.14.4/8.14.4/Submit) id p79EdmQ7039525; Tue, 9 Aug 2011 15:39:48 +0100 (BST) (envelope-from mexas@bristol.ac.uk) X-Authentication-Warning: mech-cluster241.men.bris.ac.uk: mexas set sender to mexas@bristol.ac.uk using -f Date: Tue, 9 Aug 2011 15:39:47 +0100 From: Anton Shterenlikht To: Rod Person Message-ID: <20110809143947.GA39516@mech-cluster241.men.bris.ac.uk> Mail-Followup-To: Rod Person , Anton Shterenlikht , freebsd-questions@freebsd.org References: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> <20110809094026.dea10d7a.rodperson@rodperson.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110809094026.dea10d7a.rodperson@rodperson.com> User-Agent: Mutt/1.4.2.3i Cc: Anton Shterenlikht , freebsd-questions@freebsd.org Subject: Re: extracting text from docx files X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2011 14:39:50 -0000 On Tue, Aug 09, 2011 at 09:40:26AM -0400, Rod Person wrote: > On Tue, 9 Aug 2011 14:36:32 +0100 > Anton Shterenlikht wrote: > > > Usually I unzip a docx and then search > > through all *xml files to find the > > useful data. However, I can't find any > > xml styles to use, so I have to convert > > the relevant xml file(s) to plain text > > by hand. I wonder if anybody can suggest > > a better way. Perhaps there's something > > in ports that can help. > > You could try this for just plain text conversion > http://docx2txt.sourceforge.net/ Thank you Anton -- Anton Shterenlikht Room 2.6, Queen's Building Mech Eng Dept Bristol University University Walk, Bristol BS8 1TR, UK Tel: +44 (0)117 331 5944 Fax: +44 (0)117 929 4423