FreeBSD Mail Archives

Date:      Tue, 9 Aug 2011 15:39:47 +0100
From:      Anton Shterenlikht <mexas@bristol.ac.uk>
To:        Rod Person <rodperson@rodperson.com>
Cc:        Anton Shterenlikht <mexas@bristol.ac.uk>, freebsd-questions@freebsd.org
Subject:   Re: extracting text from docx files
Message-ID:  <20110809143947.GA39516@mech-cluster241.men.bris.ac.uk>
In-Reply-To: <20110809094026.dea10d7a.rodperson@rodperson.com>
References:  <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> <20110809094026.dea10d7a.rodperson@rodperson.com>

On Tue, Aug 09, 2011 at 09:40:26AM -0400, Rod Person wrote:
> On Tue, 9 Aug 2011 14:36:32 +0100
> Anton Shterenlikht <mexas@bristol.ac.uk> wrote:
> 
> > Usually I unzip a docx and then search
> > through all *xml  files to find the
> > useful data. However, I can't find any
> > xml styles to use, so I have to convert
> > the relevant xml file(s) to plain text
> > by hand. I wonder if anybody can suggest
> > a better way. Perhaps there's something
> > in ports that can help.
> 
> You could try this for just plain text conversion
> http://docx2txt.sourceforge.net/

Thank you
Anton

-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110809143947.GA39516>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation