From owner-freebsd-questions@FreeBSD.ORG Tue Aug 9 19:35:46 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 157BC1065672 for ; Tue, 9 Aug 2011 19:35:46 +0000 (UTC) (envelope-from test@nyx.user-mode.org) Received: from nyx.user-mode.org (nyx.user-mode.org [81.169.165.58]) by mx1.freebsd.org (Postfix) with ESMTP id C7AD78FC0C for ; Tue, 9 Aug 2011 19:35:45 +0000 (UTC) Received: from nyx.user-mode.org (localhost.localdomain [127.0.0.1]) by nyx.user-mode.org (Postfix) with ESMTPS id 6F8B0242C05C; Tue, 9 Aug 2011 21:16:12 +0200 (CEST) Date: Tue, 9 Aug 2011 21:16:11 +0200 From: Christian Barthel To: Anton Shterenlikht Message-ID: <20110809191610.GA6129@nyx.user-mode.org> References: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> X-PGP-Key: "http://bc.user-mode.org/barthelc.asc" X-OS: GNU/Linux User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-questions@freebsd.org Subject: Re: extracting text from docx files X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2011 19:35:46 -0000 On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote: > I often receive information in *.docx format > from my MS using colleagues. Sometimes I can > ask for a pdf (or similar) instead, but not always. You have a lot of nice options: - Force them to use BSD/Linux ;) - explain them, why docx is shit! - don't read it > > Usually I unzip a docx and then search > through all *xml files to find the > useful data. However, I can't find any > xml styles to use, so I have to convert > the relevant xml file(s) to plain text > by hand. I wonder if anybody can suggest > a better way. Perhaps there's something > in ports that can help. But if you really, really need to read docx, you can try the web application from Microsoft. A few months ago, I got also a lot of docx and I opend it with the microsoft web app; this worked for me to extract the information... More information: http://office.microsoft.com/en-us/web-apps/ The downside: you have to sign up on a microsoft service :( cheers -- Christian Barthel Public-Key: http://bc.user-mode.org/bc.asc Mail: bc@nyx.user-mode.org Web: http://bc.user-mode.org