From owner-freebsd-questions@FreeBSD.ORG Tue Aug 9 17:25:31 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8201106564A for ; Tue, 9 Aug 2011 17:25:31 +0000 (UTC) (envelope-from kurt.buff@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7A4D68FC16 for ; Tue, 9 Aug 2011 17:25:31 +0000 (UTC) Received: by wyh21 with SMTP id 21so204954wyh.13 for ; Tue, 09 Aug 2011 10:25:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=t0bpVpHLwnYanEGrUKHcfu5x0sjOW9uMQCtxujW56Ec=; b=ImXZin6XLsUj2kyRXM88eHaSw9uq9ghY47+gxRRURdA2p5ji2d33NvRqqyzOj01xVH ZpPIJvW0Hx5tMsBrl8ENDiSNuMot51HeMFkFXG0LsqD+wOnhHTdr3jjRaLUyrh31EJGz 3+49LtxHOux1DMY6cvTnZJGhmgkLm9TmoKWmM= MIME-Version: 1.0 Received: by 10.216.134.17 with SMTP id r17mr5887667wei.59.1312910730287; Tue, 09 Aug 2011 10:25:30 -0700 (PDT) Received: by 10.216.174.207 with HTTP; Tue, 9 Aug 2011 10:25:30 -0700 (PDT) In-Reply-To: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> References: <20110809133632.GA37445@mech-cluster241.men.bris.ac.uk> Date: Tue, 9 Aug 2011 10:25:30 -0700 Message-ID: From: Kurt Buff To: freebsd-questions@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: extracting text from docx files X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Aug 2011 17:25:32 -0000 On Tue, Aug 9, 2011 at 06:36, Anton Shterenlikht wrot= e: > I often receive information in *.docx format > from my MS using colleagues. Sometimes I can > ask for a pdf (or similar) instead, but not always. > > Usually I unzip a docx and then search > through all *xml =C2=A0files to find the > useful data. However, I can't find any > xml styles to use, so I have to convert > the relevant xml file(s) to plain text > by hand. I wonder if anybody can suggest > a better way. Perhaps there's something > in ports that can help. My installation of OpenOffice 3.3 on my Win7 machine will open a Winword 2010 .docx file. I'm guessing it will do the same on FreeBSD, but I don't have an install with a GUI running at the moment. Kurt