From owner-freebsd-questions@FreeBSD.ORG Tue Aug 9 20:40:15 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FA75106564A for ; Tue, 9 Aug 2011 20:40:15 +0000 (UTC) (envelope-from aimass@yabarana.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4693C8FC0C for ; Tue, 9 Aug 2011 20:40:14 +0000 (UTC) Received: by yxl31 with SMTP id 31so343897yxl.13 for ; Tue, 09 Aug 2011 13:40:14 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.150.68 with SMTP id z4mr7068105icv.23.1312922414141; Tue, 09 Aug 2011 13:40:14 -0700 (PDT) Sender: aimass@yabarana.com Received: by 10.231.30.136 with HTTP; Tue, 9 Aug 2011 13:40:14 -0700 (PDT) In-Reply-To: References: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> <20110809191610.GA6129@nyx.user-mode.org> Date: Tue, 9 Aug 2011 16:40:14 -0400 X-Google-Sender-Auth: OyI6dcsPq9p9od6qdjsZf_gRRpI Message-ID: From: Alejandro Imass To: freebsd-questions@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: extracting text from docx files X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2011 20:40:15 -0000 On Tue, Aug 9, 2011 at 3:57 PM, Antonio Olivares wrote: >> But if you really, really need to read docx, you can try the web >> application from Microsoft. A few months ago, I got also a lot of docx >> and I opend it with the microsoft web app; this worked for me to extract >> the information... >> just a thought here but if docx is XML why not just find/build some XSLT that extracts what you need into another format? you probably have libxml2 and libxslt already in your system, and the command line utility: xsltproc there are probably already existing XSLT to transform to RTF and plain text. -- Alejandro Imass