Date: Tue, 9 Aug 2011 16:40:14 -0400 From: Alejandro Imass <ait@p2ee.org> To: freebsd-questions@freebsd.org Subject: Re: extracting text from docx files Message-ID: <CAHieY7SKAbaqyY9LkS6krsxtHOUHi1i4RAtV03AmmLcSt9cAbA@mail.gmail.com> In-Reply-To: <CAJ5UdcNqxZwTjs33xdUXatWCN%2BSDP3EFkqb_MYeVTF34rvsmxg@mail.gmail.com> References: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> <20110809191610.GA6129@nyx.user-mode.org> <CAJ5UdcNqxZwTjs33xdUXatWCN%2BSDP3EFkqb_MYeVTF34rvsmxg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 9, 2011 at 3:57 PM, Antonio Olivares <olivares14031@gmail.com> wrote: >> But if you really, really need to read docx, you can try the web >> application from Microsoft. A few months ago, I got also a lot of docx >> and I opend it with the microsoft web app; this worked for me to extract >> the information... >> just a thought here but if docx is XML why not just find/build some XSLT that extracts what you need into another format? you probably have libxml2 and libxslt already in your system, and the command line utility: xsltproc there are probably already existing XSLT to transform to RTF and plain text. -- Alejandro Imass
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHieY7SKAbaqyY9LkS6krsxtHOUHi1i4RAtV03AmmLcSt9cAbA>